A reader asked: “Why should I use a geo testing tool when I can just do the analysis in google sheets?”
Eyeballing the data in a spreadsheet is a solid first step. Just know there are several ways spreadsheet analyses can bite you if you use them for more advanced statistical methods like geo testing.
I wrote down five common errors that happen in spreadsheet-based lift tests. These aren’t edge cases – they’re happening right now, and they’re costing real money.
1. You’re comparing before vs. after — and calling it an effect
This is the most common error, and perhaps the most dangerous one because it feels right.
You launch a new advertising campaign for your ride-hailing company in Chicago. Three weeks later, you open your spreadsheet, input rider volume, calculate the percentage change, and present the result: “The campaign drove a 12% lift.”
But what if ride volume in Boston – another city you’re active in where you didn’t run any ads – also went up 10% over the same period? Suddenly that 12% lift in Chicago doesn’t look as impressive, right? Maybe it was the weather, or it was spring break. Or maybe a competitor raised prices across the country and pushed riders your way.
This is the core problem with before-and-after comparisons. You can’t tell if the increase came from your campaign or from external conditions that affected all markets.
Panel A below shows the world as your spreadsheet presents it. It looks like the intervention worked.

Now look at Panel B. Same data, but now we’ve added a control group — markets where the campaign didn’t run. The control group grew at the same rate. The “effect” disappears. What looked like a campaign-driven lift was just an underlying trend.
Many things can contaminate a before-after comparison:
- Seasonality: January e-commerce sales always spike (New Year’s resolutions, gift card redemptions).
- Macro trends: Economic growth or inflation affects all markets simultaneously.
- Competitor actions: A rival’s price increase lifts your sales everywhere.
- External shocks: Pandemic lockdowns, extreme weather, policy changes.
A spreadsheet gives you Panel A. A proper geo test gives you Panel B. And the difference between the two is the difference between thinking your campaign worked and knowing whether it did.
Now Panel A of this second figure shows what happens when the campaign does work — but the spreadsheet still misleads you. The treated group grew, yes, but so did the control group (just less). The naive before-after view overstates the effect by attributing the entire increase to the campaign, when part of it was just the natural trend. Panel B separates the signal from the noise: the true treatment effect is the additional growth beyond what the control group experienced.

The bottom line is that without a control group, you’re crediting your campaign for growth that would have happened anyway — and making budget decisions on that false credit.
2. You picked your test and control markets by gut feel
“Let’s run the test in California and compare it to the rest of the country”, says the agency managing your marketing spend. This sounds reasonable, but it isn’t. It’s a big market, easy to measure, and your team already has good data there.
The problem is that California differs from the rest of the U.S. in consumer behavior, media costs, demographics, and seasonal patterns — in ways that dwarf most campaign effects. Your test result ends up measuring the difference between California and everywhere else, not the difference between running ads and not running them.
This isn’t just a California problem. Any time you pick markets based on convenience, familiarity, or “they feel comparable,” you’re introducing noise that obscures the real signal. Two markets that look similar can behave very differently in terms of purchasing patterns, competitive landscape, and media saturation.
Proper geo testing tools select markets algorithmically. And they do this optimizing for pre-test similarity — choosing test and control groups that tracked each other closely before the experiment, so that any divergence after the experiment is more likely to reflect the campaign’s actual effect. A spreadsheet can’t do this. A human picking markets on instinct can’t do this.
The bottom line: when your test and control groups aren’t properly matched, your results reflect market differences, not campaign performance. You might scale a campaign nationally based on something that only worked because California is California. And with that much noise, only a massive lift will stand out.
3. You have no idea if your test can detect anything
Before running any test, ask yourself: is this test actually sensitive enough to detect the effect size you’d realistically expect?
This is called power analysis. Given your expected lift (they call it ‘effect size’), the variance across your markets, and the duration of the test, it tells you whether your experiment has a realistic chance of detecting an effect — if one actually exists.
Think of it this way: if you’re trying to read a street sign from a mile away with weak binoculars, the sign is there, but you’ll never see it. Your conclusion isn’t “there’s no sign.” It’s “my equipment can’t detect it.” The same thing happens with underpowered experiments: even when there is a real lift there, we may not be able to detect it because our experiment isn’t powerful enough.
If your expected lift is 3% but your test setup can only detect effects of 15% or larger, you’re guaranteed an inconclusive result. You’ll spend weeks of operational effort and budget, then conclude “the campaign didn’t work” — when the real conclusion is “the test was blind.”
Here’s the danger: a proper geo testing tool will tell you not to run the test if power is insufficient. It will say: “Given your markets, your expected effect size, and your test duration, this experiment can’t give you a reliable answer. Don’t waste your time.” A spreadsheet will never tell you that. It will happily compute a result no matter how meaningless the setup.
The bottom line: without power analysis, you’re flying blind. You kill campaigns that are actually working, or you keep running tests that were never able to give you an answer.
4. You got a number — but no range of uncertainty around it
After your spreadsheet calculations, you got a number. But how confident are you in that number? Is the true effect somewhere between 0.5% and 9.5%? Or tightly bounded between 4.8% and 5.2%?
Those two scenarios look identical in a spreadsheet — both show “5%.” But they’re radically different situations for a budget decision. In the first case, the campaign might have barely moved the needle or might have driven nearly 10% lift — you genuinely don’t know. In the second, you have a tight, reliable estimate you can act on.
Without uncertainty intervals (confidence intervals in a frequentist framework, credible intervals in a Bayesian one), you’re making million-dollar allocation decisions based on a single point estimate with no sense of how much to trust it.
When the uncertainty range is wide, teams ignore it and just use the average: “The intervals are too wide so I just look at the mean.” That’s exactly backwards! The width of the interval is the information. If the incremental CPA ranges from $40 to $900, you can’t make a budget decision based on $150. The test is telling you: “I don’t have a clear answer yet.”
A proper geo testing tool produces a full distribution of plausible outcomes while a spreadsheet generally produces one number and hides all the uncertainty behind it.
The bottom line: your next-quarter forecast will be built on false precision. The number looks solid, but there’s nothing holding it up.
5. Your ROI estimate isn’t the ROI that matters
Without a control group, your spreadsheet may calculate a ROI: total revenue in the test market divided by total ad spend. But that total includes sales that would have happened anyway, despite that investment — organic demand, returning customers, seasonal tailwinds. You’re dividing revenue you didn’t cause by spend you did incur.
What you actually need is incremental ROI: the additional revenue generated only because of the campaign, divided by what you spent to generate it. That’s the marginal return — the number that answers the real question: is the next dollar of spend worth it?
The gap between total ROI and incremental ROI can be enormous. A spreadsheet that says “4x ROI” might really be “1.2x incremental ROI” once you subtract the baseline revenue that would have come in with or without the ads. And the difference between 1.2x and 4x is the difference between “cut this channel” and “double the budget.”
The bottom line: you’re making allocation decisions based on inflated ROI that credits the campaign for revenue it didn’t generate. That’s not measurement. It’s accounting in a favorable light.
But hey, the spreadsheet isn’t the enemy
Let me clear the air: spreadsheets aren’t bad tools. They’re great for tracking spend, organizing test calendars, and reporting results. The problem is using them as the analytical engine for causal inference — for answering the question “did this campaign actually cause additional revenue?”
That question requires a counterfactual, proper market selection, power analysis, uncertainty quantification, and a clean separation between revenue you caused and revenue that was coming anyway. No spreadsheet formula can give you that.
The five errors above aren’t obvious and that’s what makes them dangerous. The spreadsheet gives you a clean number and no indication that anything went wrong.
So if you’re running geo tests or experiments in general, bring these four questions to your next meeting:
- Does our analysis include a control group?
- Did we run a power analysis before launching the test?
- Are we reporting incremental ROI or just standard ROI?
- Do we have uncertainty ranges around our estimates?
If the answer to any of these is “no” or “I don’t know”, it might be time to leave the spreadsheet.



