Imagine a brand runs a geo lift test. They shut off ads in three regions for a few weeks, compare the results to three control regions, and observe a lift in sales in the treated geos. The model spits out a stat sig result, and the team gets excited.
The experiment reports a 3% lift and an incremental CPA of $75. At face value, that looks promising. But when we check the uncertainty interval, it spans from $25 to $750.
That’s a low-signal result. What went wrong?
First, “any positive impact” isn’t the right goal.
From a business perspective, you don’t care if a marketing channel has any positive impact. You care if it has a profitable one. And lift percentages –on their own– don’t tell you anything about profitability. A 3% lift might sound impressive, but if you had to spend an enormous amount of money to get it, it might be a net negative for the business.
To make incrementality results meaningful, you need to convert lift into either incremental ROI or incremental CPA. That’s the only way to actually know whether a channel is worth continuing to invest in.
Second, the absence of uncertainty intervals makes the result useless.
You can’t evaluate an incrementality estimate without its uncertainty bounds. We’ve had customers say: “our experiment shows an incremental CPA of $75.” But when we examine the confidence intervals, it turns out the true value could be anywhere from $25 to $750.
That’s too wide of a range — it’s the difference between a profitable win and a financial loss. And the $75? That’s just the midpoint. Not the most likely value.
If the uncertainty interval spans both profitable and unprofitable outcomes, then the result is not decision-useful. You’d be just as likely to improve your marketing performance by flipping a coin.
This is especially common when tests are underpowered — either because the sample size is too small, the effect size is too small, or the design isn’t robust. And unfortunately, most organizations don’t have internal safeguards to catch these issues, so they’re acting on noise.
Third, a p-value under 0.05 doesn’t mean your result is “true”
It just means it’s unlikely under a particular statistical model. That model might be wrong. Your priors still matter.
This is an organizational risk. Acting on noisy results leads to budget reallocations that are not based in signal. When those bets don’t pay off, finance starts questioning the entire measurement framework.
A Framework for Evaluating Test Signal Quality
To avoid this, here’s the framework we use internally at Recast to assess whether a test result is worth acting on:
- Has the lift been translated into a business-relevant metric?
Raw lift percentages are intuitive, but not actionable. You need to convert lift into a metric that maps directly to decision-making — typically incremental ROI or CPA.
At that point, the question isn’t whether there was lift — it’s whether it was worth it.
- Do the confidence intervals rule out unprofitable outcomes?
This is the biggest trap. Teams often fixate on the point estimate (“$75 incremental CPA!”), while ignoring that the 90% interval spans from $25 to $750.
If your uncertainty interval spans both sides of your profitability threshold, you don’t have a signal. You have noise.
- Would this result change your budget allocation?
If the answer is “no,” then why are you even running the test? High-signal results should give you the confidence to reallocate dollars — either to double down or to pull back.
If you’re not willing to move money based on the result, then the test didn’t teach you anything useful. It might have been statistically significant, but it wasn’t strategically significant.
- Was the test designed with enough power to detect meaningful effects?
Most weak results start with weak design. That includes too few regions, short durations, or tests that aim to detect tiny effects that don’t matter anyway.
Design your test with enough power to detect meaningful effects – and plan for it up front, not after you’ve collected weak results.
The bottom line:
It’s easy to confuse statistical significance with business relevance. But if the confidence interval crosses zero ROI, or includes CPAs above your profitability threshold, the result should be treated as indeterminate — not as a win.
That’s a tough pill for many teams to swallow. You spent time and money on the test. But not every experiment delivers useful signal. That doesn’t mean it failed — it means it told you what you don’t know. And that knowledge can be just as valuable, especially when it prevents you from making a bad bet.