Is R-squared the Right Metric to Judge your MMM?

You have probably seen R-squared everywhere. It is one of the most widely taught and commonly cited metrics in statistics. It’s taught in every introductory stats course. It’s included by default in nearly all statistical software packages. Run a regression, and R-squared is going to be the metric they base it around. But what does it really mean?

Technically, it measures the proportion of variance in the dependent variable that is explained by the independent variables in a regression model. Mathematically, it’s 1 minus the ratio of the residual sum of squares to the total sum of squares.

In simpler language: R-squared tells you how well your model fits the data it was trained on. 

An R-squared of 0.80 means 80% of the variance in your outcome variable (say, sales) is “explained” by your inputs (like TV, paid search, and social). An R-squared of 1.0 means perfect fit. A model with 0.0 explains nothing.

And because it’s easy to calculate, interpret, and seems to give you a clear signal of model quality, it has become the default metric for many analysts.

In media mix modeling, you’re building a model to explain how marketing channels drive business outcomes and stakeholders want something they can trust – and R-squared has become a shorthand for model performance. 

You’ll often hear: “What’s the R-squared?” Or, “Can we improve the R-squared?” We’ve seen teams even use it as the primary KPI for model success – 0.85 or 0.9 seemingly becoming the threshold to consider a model acceptable.

A lot of MMM vendors also use R-squared in pitch decks or reports. It feels scientific. And when it’s high, it gives analysts, modelers, and execs a sense of security.

But that is the problem. R-squared gives you the illusion of model quality. Here’s why.

Why R-squared fails in marketing mix modeling

The problem with treating R-squared as the gold standard for model quality is that it can actively push you toward making bad decisions.

The core issue is that R-squared only measures in-sample fit. It tells you how well your model matches the data it was trained on, but not how well it will perform on data it hasn’t seen. 

And when you’re using MMM, the goal is to use that model to make future marketing decisions – so in-sample performance is the wrong thing to optimize.

Let’s say your model’s initial R-squared comes out to 0.65. That might feel too low. So you start “fixing” it. You add in controls for every spike or dip the model missed. You introduce dummy variables for specific days. You tweak the functional forms to squeeze out more fit. This is incredibly easy to do in MMM

Now your R-squared is 0.92. On paper, the model looks great and it explains almost all the variance in sales. But what you’ve actually done is overfit the historical data. The model hasn’t learned anything about the underlying drivers of performance. But now you’re confident in  a flawed model. 

R-squared also tells you nothing about incrementality or causality. It doesn’t help you understand whether a given channel is actually driving sales, or just correlated with them. Two variables can move together in the data without one causing the other. 

That’s why R-squared is a terrible primary metric for judging a media mix model. So what are your other options?

How to validate a model the right way

If you want to know whether a marketing mix model is good, there’s only one question that really matters: Can it predict the future?

That’s the gold standard. Not in-sample fit. Not statistical significance. Not R-squared. The only way to know if a model is trustworthy is to test whether its predictions hold up on data it hasn’t seen before.

The simplest way to do this is holdout forecasting. You train the model on a subset of historical data – say, January to October – and then ask it to forecast what happens in November and December. Once you have actual results, you can compare them to the model’s predictions. If your model is well-calibrated, it should get reasonably close. 

Another technique is the parameter recovery test using synthetic data. Here, you generate artificial data where you already know the “true” impact of each channel. Then you feed that data into your model and see if it can recover those true parameters. 

You can also validate your model with real-world experimentation, like geo-based lift tests. For example, you increase YouTube spend by 20% in one geographic region while keeping all else constant, and compare results to a control region. 

You also have budget shift tests. If your model recommends cutting TV by 30% and moving that spend to paid search, you should expect the outcomes to align with the model’s forecast once that change is implemented. If they don’t, something’s off.

The point is this: if your model can’t reliably predict what’s coming next, it’s not a model you should be making decisions with.

The questions smart marketers should be asking

If you’re considering MMM for your org and talking to different vendors (or to your internal team), these are the questions we recommend you ask them to dig deeper to see if they understand the metrics that matter for your business.  

1. How do you validate model outputs?
This is the most important question. If a vendor can’t clearly explain how they test their model’s predictions in the real world, that’s a red flag. 

2. What’s your out-of-sample forecast accuracy?
Any model can fit the past. What matters is how well it forecasts the future. Ask vendors to show performance on held-out data.

3. Do you run parameter recovery tests?
If a model can’t recover known true effects from synthetic data, it won’t work in the real world either.

4. How does the model handle budget changes?
What happens if you double your Facebook spend or cut TV in half? A trustworthy model should forecast the impact of those changes with reasonable accuracy.

5. What evidence do you have that this model estimates causal impact?
Marketing decisions depend on understanding what caused what. If the model can’t differentiate correlation from causation – and prove it – it’s not ready for forecasting..

These are the baseline questions a vendor has to answer clearly and with proof – it’s a red flag if they do not. 

TL;DR

  • A high R-squared doesn’t mean your media mix model is good – it just means it fits the past.
  • R-squared is easily manipulated and rewards overfitting, which makes it dangerous for marketing decision-making.
  • The only way to trust a model is to test how well it predicts the future using out-of-sample data or real-world experiments.
  • If your MMM vendor can’t answer tough questions about validation and causality, turn around.
Scroll to Top