Considerations When Designing GeoLift Tests

GeoLift by Recast is built to help you design and analyze rigorous geographic incrementality experiments. However, when using GeoLift or any other similar tools there’s one issue you should always be aware of: the problem of “generalizability”.

The problem of “generalizability” plagues all types of experimental design – researchers and scientists are always worried that what happens in the lab or on their specific instruments might not “generalize” to other labs, other instruments, or to the “real world”. This problem can be especially severe when trying to design geographic-based incrementality experiments to evaluate marketing effectiveness since 1) we’re often working with fairly small population- or sample-sizes and 2) marketing performance is constantly changing.

So we have two dimensions of generalizability that we should be thinking about:

Generalizability from our sample to the population. Just because we observe a lift in our sample does that mean we can expect to observe the same lift when we scale to a national campaign?
Generalizability over time: just because we observed a lift in March does that mean we’ll get the same lift in April? Or in August?

So let’s dig into how you might think about these two generalizability problems.

Generalizability from our sample to the population

The smaller the sample-size in your treatment group, the less generalizable the experiment will be. In general, when we’re analyzing the results of an experiment, we want the sample to be representative of the population. The more representative the sample is, the better we should expect the experiment will generalize to the whole population.

In the case of a clinical trial in healthcare we are going to want to confirm that the demographics in our experimental sample match the population as a whole. We might check

% men vs women
Distribution of weights (or % overweight)
Geographic distribution (don’t want all California or all New York)
Etc.

Once we confirm that the sample matches our population as a whole we will have more confidence that the results will generalize. If our sample doesn’t match the population as a whole then we will have to take care in interpreting the results. If the treatment was only tested on men, then we wouldn’t want to just assume that it will have the same effect on women!

In the context of marketing incrementality experiments we have the same problem. If our only treatment geographies are Austin and Dallas (both cities in Texas in the United States) then we might feel comfortable saying our results should generalize well to Houston (another city in Texas) but maybe not to the rest of the country.

So when we’re designing our experiment we need to keep this tradeoff in mind: smaller experiments will require less investment (or lower opportunity costs) but that comes at the cost of reduced generalizability.

This is a tradeoff and every business is going to think about this tradeoff differently depending on their strategy and risk tolerance. It might be the case that for a critically important experiment you want to be very certain the results are going to generalize so it’s worth it to invest more in a larger, more representative experiment. But it could also be the case that in the interest of moving quickly and testing more channels more rapidly it’s worth launching tests with some risk that won’t fully generalize.

In GeoLift by Recast you can control the number of geographies in your treatment group using the slider in the left-hand toolbar.

Generalizability over time

Marketing performance changes over time. Your creative changes, your targeting strategy changes, your products improve, Zuck changes the algorithm, your competitors change their price, and many more things can all impact the return on investment of your campaign.

This means that the results from an experiment run 12 months ago may not apply (at all) to your current performance. There are also other factors that can impact generalizability over time. Factors like seasonality or other one-off events can dramatically impact how much you can use your experimental results into the future.

For example, you could design the perfect incrementality experiment and run it during your Black Friday holiday promotion. That experiment will tell you a lot about how your marketing campaign performs during black Friday but tells you very little about how you can expect your marketing campaigns to perform at other non-promotional times.

Similarly, if you have a highly seasonal business (like an Ice Cream shop) running an experiment in the winter (low demand) will not tell you much about how your marketing campaign might perform in the summer (high demand).

This doesn’t mean you shouldn’t run experiments, it just means you need to be thoughtful about what question you’re trying to answer and make sure that your experiment is set up to actually answer that question.

When using GeoLift by Recast, if you’re designing an experiment for the peak season, you want to make sure that you’re designing the experiment using data from previous peak seasons, not just the last four months (if the last four months were the low season). You can control this by simply changing what dates are included in the data upload you use for analysis.

Considerations When Designing GeoLift Tests

About The Author

Michael Kaminsky

Related posts:

About The Author

Michael Kaminsky