How to do holdout accuracy and backtesting for your MMM (and why it matters)

You don’t just want an MMM model. You want a model that can be trusted. 

But since MMM models are really powerful, they can also be subject to overfitting. 

So, how can we build confidence in the model and the recommendations we’re making to the marketing team? 

The answer: backtesting / holdout forecast accuracy.

What is holdout accuracy / backtesting?

The idea is simple: we want to evaluate how well our MMM model can predict the future. So we train the model using data up to some time period in the past (as if it was, say, 3 months ago) and then we ask the model to “predict” the next 3 months. 

The model hasn’t seen those 3 months of data, but we have, so we can evaluate the model’s forecast accuracy. It’s called “holdout forecast accuracy” since we have “held out” the last 3 months of data.

This is helpful to us since, if we do it correctly, this forecasting exercise can tell us about how well we might be able to forecast the next 3 months which are truly in the future.

Another application of this idea outside of marketing is in stock trading.

When a hedge fund is evaluating a new stock-trading algorithm, how do they evaluate if it will work or not? How do they know if it will actually make them money?

Hedge funds have sophisticated systems for doing backtesting. Which is to say it’s an environment where they can “test” their algorithms on historical data to see how they would have done in the past so that they can evaluate if the algorithm would have made them money.

If the backtesting results are positive, then that gives the fund conviction that they can deploy this new algorithm or strategy in the real world to generate real profits.

MMM is a causal inference problem where the true relationships we care about measuring are unknown and unknowable. What we want to validate is if the causal relationships estimated by the model actually hold in the real world. Holdout forecasting accuracy can help us do this.

MMM and Causal Inference

So, when can holdout forecast accuracy actually help us validate if the model has truly estimated the underlying causal relationships?

At Recast we like to say: if the model can consistently forecast the future in a system without information leakage in the face of exogenous budget changes, then the model likely has picked up the true causal relationships.

So let’s break down what this means:

  • Consistently means we can do it over and over again in different time periods. We aren’t just “getting lucky” with a single snapshot in time
  • Without information leakage means that we’ve set up the forecasting “challenge” correctly — we don’t have any access to information we wouldn’t have today if we were making a forecast for the next 3 months
  • In the face of exogenous budget changes means that we are making predictions while manipulating the system itself. If the budget isn’t changing, then we could make good forecasts by getting a read on correlation instead of causation. We need to prove that we can make good forecasts when things are changing to know that we’ve gotten the model correct.

So let’s talk about what a good holdout forecast accuracy system looks like:

The system should be continuous. 

Holdout forecast accuracy isn’t a “one and done” check but rather should be applied at multiple time points during the initial model build, and then in an ongoing way as the model is refreshed.

At the initial model build time we recommend backtesting the MMM model as of 30, 60, and 90 days ago as a bare minimum.

And then, from that initial model build forward you want to make sure that your model’s predictions continue to perform well into the future so that as the marketing team acts on your model’s recommendations you can see how well the predictions hold up. This should ideally be built into a fully automated system so that it’s easy for you and your stakeholders to monitor this performance.

The ideal system should take great pains to avoid information leakage

Information leakage just means that the model is able to “cheat” by using information from a forecast that wouldn’t actually be available at modeling time. This is a really thorny problem in marketing mix models because of how dynamic and interrelated the components of the system are.

One good example of information leakage in the context of MMMs happens when the model includes a variable for something like “website traffic” as an independent variable in the model. The problem with including “website” traffic in the model is that it’s not truly independent of the dependent variable. If the conversion rate is constant and you tell me how many sessions you have, it’s a trivial calculation for me to tell you exactly how many conversions you had!

This means that if you include website traffic in the holdout forecasting exercise, you’re allowing the model to “cheat” since there’s information leakage from the website traffic variable to the dependent variable we’re trying to predict. This problem also occurs with variables like “branded search” and affiliate programs. If you’re including branded search spend in your holdout forecasts, you are likely helping your model to cheat!

Another source of information leakage can occur in multi-stage models where only one stage is held out, but the other stages aren’t!

Facebook’s open-source Robyn package actually suffers from this problem as its “validation” accuracy doesn’t reflect a “true” holdout forecast since the holdout data are included in the first-stage model that estimates the baseline and time-series components of the model (via prophet).

This allows information to leak in from the dependent variable, so it’s very difficult to get a sense of Robyn’s true holdout accuracy!

So once you have a system for holdout forecast accuracy measurement and backtesting, then you just need to make it continuous! This continuous measurement will help give you and your stakeholders confidence that when you’re making a recommendation today about how they should change their budget next month you will not be leading them astray (since you’ve shown that you can do this consistently over and over again).

As we’ve said before: model validation is all about continuously validating the MMM results from multiple angles. 

Ask your MMM vendor about holdout accuracy and backtesting:

When deciding what vendor should run your model, make sure you’re asking to see their holdout predictions. 

Bad answers from your MMM vendor are:

Anything such as that they don’t do it, or to not worry about it, or that they only focus on the in-sample fit or the mean squared. All these are indicative that they’re trying to hide the ball.

With a complex statistical model, you can make the results say anything that you want if you try hard enough. Unfortunately, that doesn’t actually help to drive the business forward. 

And so the things that we do around hold out predictive accuracy, in large part, prevent us from doing those sorts of shenanigans. If we “make” the model say TV is really effective but it’s actually not, we’re going to miss on our holdout forecasts consistently.

Modelers aren’t there to tell the story that makes people look good, but to deliver results that can actually drive the business forward.

Good answers are:

Discussions around true out-of-sample forecast accuracy. That way you can understand if the model can consistently predict the future and whether you have good reason to trust it and make decisions based on it or not.

About The Author