MMMs are powerful models and that means they’re subject to what modelers call “over-fitting”. The idea is that you can build a model that fits really, really well to the data that the model is trained on, but that hasn’t found the actual underlying causal relationships in the data.
Over-fitting happens when the model is “too powerful” and fits to noise in the data, instead of the signal. And if your model is too overfit, it will be fit only to noise and will miss out on the signal entirely.
This article will cover the main reasons behind model overfitting, explore the misleading nature of in-sample R-squared, and touch on out-of-sample validation as a practical method to validate your media mix modeling.
What is Overfitting in MMM
Overfitting is a critical challenge in Marketing Mix Modeling (MMM) that happens when models capture noise instead of the true underlying relationships in the data. This phenomenon happens when the model is “too powerful,” fitting closely to the training data but failing to generalize to new, unseen data. As a result, the model picks up on random fluctuations and patterns that do not reflect the actual causal factors influencing sales.
What that means in practice is that the results we get from the model will be wrong. They will not be driven by the true causal signal in the data, and instead just by noise. And then, when we go to actually use the model to make budget changes, we could end up costing our business millions of dollars.
One common way overfitting can happen is when there are too many independent variables included in the model. While more variables might seem beneficial, it actually often leads to the model capturing noise instead of the true signal.
For example, you can get a high R-squared — the metric that shows the amount of variation in the dependent variable explained by the model — by simply adding variables with random noise. These random variables might coincide with portions of sales data and give the illusion of a good fit but the high R-squared value is misleading since it doesn’t indicate the model’s ability to predict future outcomes accurately.
Another example of overfitting is the use of dummy variables to artificially boost R-squared.
For example, adding meaningless dummy variables on days when the model has the biggest misses can drive up the R-squared value. This approach might make the model appear to explain most of the variance in the dependent variable but it actually injects biases and reduces the model’s predictive power by focusing on noise rather than signal.
But isn’t having a high R-squared value a good thing?
That’s a common misconception – let’s clear that up.
The Misleading Nature of In-Sample R-Squared
In-sample R-squared is a commonly used metric in Marketing Mix Modeling (MMM) to measure the goodness of fit. It indicates the proportion of the variation in the dependent variable that is explained by the model.
A high in-sample R-squared is frequently misinterpreted as a sign of a superior model. Many believe that a higher R-squared value means the model is better at explaining the data.
However, this is just not true.
In-sample R-squared measures how well the model fits the training data, but it does not guarantee that the model will make accurate predictions on new, unseen data.
We’ve seen that evaluating a model solely on fit metrics like R-squared, MAPE, or RMSE can lead to overfitting. You can have a terrible model that has a really high R-squared, and you can have a great model that has a really low R-squared.
To add to that, traditional fit metrics such as Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) can also be deceptive. Just like R-squared, these metrics tend to improve as more variables are added to the model, but this improvement often comes from capturing random fluctuations rather than meaningful patterns. The model ends up being tailored to the idiosyncrasies of the training data, leading to poor generalization when applied to new data.
So, if R-squared isn’t a good way to validate your MMM, what’s the right alternative?
Evaluating MMM: The Importance of Out-of-Sample Validation
Out-of-sample validation is crucial for accurately evaluating Marketing Mix Models (MMMs). Unlike in-sample metrics, which measure how well the model fits the training data, out-of-sample metrics provide a more reliable look into a model’s performance and its ability to analyze new data it hasn’t seen before – that’s how we can make sure the model has picked true causality.
The idea is simple: we want to evaluate how well our MMM model can predict the future on data it hasn’t seen before.
When we’re taking action based on our MMM model to make decisions about our budget next quarter, we’re implicitly asking the model to predict performance on data it hasn’t seen before. So it’s logical to test how well the model can predict the future!
Here’s how to do it: you train the model using data up to some time period in the past (as if it was, say, 3 months ago) and then we ask the model to “predict” the next 3 months.
The model hasn’t seen those 3 months of data, but we have, so we can evaluate the model’s forecast accuracy.
It’s called “holdout forecast accuracy” since we have “held out” the last 3 months of data.
This approach is really helpful for building confidence in the model and understanding how well the model can actually predict data it hasn’t seen before (and helps you avoid over-fitting). This is one of the ways we validate models here at Recast.
TL;DR:
- Marketing Mix Modeling (MMM) sometimes faces the challenges of overfitting – capturing noise rather than true relationships.
- In-sample R-squared often misleads, as high values can promote overfitting without guaranteeing accurate future predictions.
- Overfitting can occur by adding too many variables or using dummy variables to artificially boost R-squared.
- Out-of-sample validation provides a more reliable measure of model performance, ensuring the model captures true causal relationships.