Why Internal Validation Metrics Are the Wrong Way to Judge Your MMM

The most dangerous media mix model (MMM) isn’t the one that’s obviously wrong—it’s the one that looks right but quietly misleads you.

A model can perfectly match past sales trends while being completely wrong about why those trends happened. That’s because internal validation metrics like RMSE, MAPE, and R-squared can create a false sense of confidence because MMMs are powerful statistical tools that will almost always show a good in-sample fit –even when their actual channel-level predictions are completely wrong.

So how do you know if your MMM is trustworthy? In this article, we break down why internal metrics are misleading and walk you through the real validation methods—geo holdouts, on/off experiments, and out-of-sample tests— that actually prove whether a model works.

Why Internal Validation Metrics Are Misleading

1. MMMs Will Almost Always Show a “Good Fit” In-Sample

One of the biggest misconceptions about MMMs is that if a model fits historical data well, it must be producing accurate results. But that’s just not true.

With enough complexity, any MMM can be made to fit past data perfectly – even if its underlying causal inferences are completely incorrect.

For example, you could build a simple model in Excel that perfectly matches your historical sales data by adding enough variables and tweaking coefficients. 

But would that model accurately predict what happens when you shift your budget? Absolutely not.

 It’d just be overfitting to noise in your historical data, not actually capturing causal relationships.

This is why you can’t rely on “goodness-of-fit” metrics alone. They don’t tell you if your model’s marketing effectiveness estimates are accurate –they just tell you how well the model conforms to the patterns it has already seen.

2. In-sample R-squared and MAPE Encourage Overfitting

Many MMM vendors highlight their model’s high R-squared or low MAPE as evidence that it’s working well. This is also a red flag.

R-squared is a measure of how much variance in the dependent variable (e.g., sales) is explained by the model. But it says nothing about whether the model is estimating true causal relationships.

In fact, in-sample R-squared is trivially easy to inflate.

If you want to increase R-squared, all you have to do is:

  • Add more variables.
  • Introduce dummy variables to correct for every small fluctuation in the data.
  • Overfit the model to the noise in your dataset.

Let’s say you build an MMM and find that the R-squared is 0.82. You then decide to add a set of arbitrary dummy variables –maybe marking special promotional periods, holidays, or random spikes in sales. And now your R-squared jumps to 0.97!

Did you improve the model? No. You just memorized past data better.

MAPE (Mean Absolute Percentage Error) has similar issues. 

If you optimize for MAPE, your model will focus on minimizing errors within sample, which often just leads to overfitting. And again, your model will look accurate on paper but you won’t be able to trust it.

The core job of an MMM isn’t just to fit past data –it’s to estimate the causal impact of marketing spend. That’s what you need to look for.

The Right Way to Validate an MMM: External Validation Methods

1. Geographic Holdout Tests

One of the most reliable ways to validate an MMM is by running geo holdout tests:  real-world experiments where you adjust marketing spend in specific geographic regions while other regions act as controls.

How it works:

  1. Select a set of test markets where you reduce spend on a specific channel (e.g., cut TV spend in 10 cities).
  2. Keep other factors as stable as possible.
  3. Measure the actual impact on sales in test vs. control regions.
  4. Compare the observed results to the MMM’s predictions.

If the model is accurate, its forecasts should match what happens in the real-world test. If not, something is wrong.

2. On/Off Experiments and Budget Shifts

Another simple way to test an MMM is by turning a channel on or off and checking if the model’s predictions hold up.

For example, if the MMM estimates that TikTok ads have a high ROI, you can validate that by pausing TikTok ads for X time and seeing whether conversions drop by the predicted amount.

If they don’t, the model was overestimating TikTok’s impact.

3. Out-of-Sample Forecast Accuracy

But one of the best ways to protect against overfitting is to test the model on data it has not seen before.

At Recast, every model update is tested using a holdout period:

  • We run the model twice –once with the full dataset and once with data only up to 30 days ago.
  • We then compare the model’s forecasted results for the last 30 days against actual sales.

If the model struggles to predict future performance, it’s a sign that it’s not capturing true causal effects and it’s just fitting historical data.

TL;DR:

  • Many MMMs look reliable but can quietly mislead you because they lack real-world validation.
  • Before hiring an MMM vendor or building an in-house model, ask: how will we prove this model is right? If the answer is “R-squared,” or “MAPE,” dig deeper.
  • Recast doesn’t rely on in-sample metrics like R-squared or MAPE to prove accuracy. Instead, we:
    • Help our clients regularly validate their model with real-world experiments (geo holdouts, budget shifts, on/off tests).
    • Run out-of-sample forecast tests to check if the model correctly predicts unseen data.
    • Eliminate analyst bias by preventing manual tweaking of control variables.

About The Author