The Correlation vs Causation Challenge in Marketing Mix Models

The distinction between correlation and causation is a fundamental challenge within MMM, and it has significant implications for the efficacy of your marketing efforts. It is not just about interpreting data correctly; it’s about making sure our budget is not being wasted and misallocated.

This article delves deep into why this distinction is critical, exploring the challenges of identifying causation in MMM, and the methodologies to validate and verify the accuracy of MMM. 

The Challenge of Identifying Causation in MMM

Correlation, in its essence, is about identifying patterns where two variables move in tandem. Picture this: you launch a new digital campaign, and simultaneously, your online sales surge. It’s tempting to immediately draw a line connecting these two events. 

But here’s where it gets tricky – correlation does not necessarily imply causation. 

It’s like observing that as ice cream sales increase, so does the number of people at the beach. Clearly, buying more ice cream doesn’t cause more people to flock to the beach. 

Similarly, in marketing, these observable correlations can often lead us astray if we hastily interpret them as cause-and-effect relationships.

The real crux of MMM lies in unraveling causation – determining if and how one factor directly influences another. For instance, discerning whether a specific ad campaign is genuinely driving sales, independent of external factors such as seasonal trends or economic shifts, is essential. 

Identifying causation means understanding what is actually incremental – which is the number one thing marketers should focus on.

Examples of Challenges in Isolating Causation

Isolating causation is not easy. We often encounter endogeneity issues, where there’s a reciprocal relationship between variables. 

Take, for example, the decision to increase advertising spend. Is this a response to rising sales, or is it the cause? This chicken-and-egg dilemma is a classic example of the intricate scenarios marketers are dealing with. Consumer behavior, competitor strategies, and market conditions are in a constant state of flux, influencing, and being influenced by, our marketing initiatives.

Adding another layer to this complexity are latent variables – those unseen factors that, while not directly measured, significantly impact the relationships we analyze. These variables can create misleading correlations, leading to erroneous conclusions about the drivers of sales or brand engagement.

Misinterpreting correlations as causal relationships can lead to ill-informed decisions, inefficient allocation of resources, and ultimately, subpar business performance. 

For instance, attributing a spike in sales to a paid social campaign could drive your team to decide to increase your paid social budget – but what if there wasn’t actually a causal relationship? You’d be wasting your budget on a non-incremental initiative.  

Marketing mix modeling (MMM) navigates these challenges through sophisticated statistical techniques and econometric models. A good model should be able to isolate the effects of individual marketing actions from the myriad of other variables at play and tell us –with confidence estimates– what is actually incremental and what is not. 

But how can you trust that your model has actually picked up causal inferences? How do you know if you can follow its recommendations?

Validating Marketing Mix Models

MMM is, fundamentally, a causal inference problem. We want to understand how different changes we might make to our marketing budgets will change our business performance. This is not simply a “prediction” problem, but rather an attempt to understand the true, causal relationships between our marketing activity and our business outcomes.

Validating causal inference models is much, much more difficult than validating simple prediction-only models and so we need a different toolset, and approach, to validating these models.

We went a bit in-depth into this very topic in the latest episode of our Modern Analytics for Marketers series where Recast co-founder Michael Kaminsky delved into why validation is the cornerstone of effective MMM. Feel free to check it out!

Now, the fundamental problem is that the thing we care about, the true incremental impact of an additional dollar spent on some marketing channel, is unknown and unknowable. No one knows, or can know, the true value of an additional dollar spent on Meta — there is no fundamental law of physics or nature to fall back on, and there’s no way to ask people or track them sufficiently well enough to know what that true impact is.

So our job as modelers is to try and validate what we’ve learned from our model, without being able to know what the true answer really is. This is the fundamental problem of validating MMMs.

Beyond just the basics of doing causal inference, the MMM problem is compounded because things change over time. What might have been true 6 months ago about marketing performance might no longer be true today. Thus, the problem of model validation in MMM is a problem that not only needs to be solved once but actually a problem that needs to be continually addressed over and over again.

How NOT to validate your MMM:

Traditional validation methods, such as relying on statistical significance or in-sample goodness of fit metrics like R-squared, are often inadequate in the context of MMM. These methods might indicate how well a model fits the data it was trained on but fail to account for the model’s predictive power or its ability to generalize to new, unseen data. 

For example, a model with a high R-squared value might seem impressive, but this could be a result of overfitting – capturing the noise along with the signal – which diminishes its utility in making real-world predictions.

How to ACTUALLY validate your MMM:

The most important methods for model validation are:

While there are different ways to validate your model, one of the most telling indicators of a model’s validity is its ability to accurately predict outcomes in out-of-sample scenarios. This involves using the model to make predictions about future or unseen data and comparing these predictions against actual outcomes. 

If a model truly understands the causal relationships it is trying to measure, it should be able to accurately forecast future events, even when marketing conditions or strategies change. 

So, when can holdout forecast accuracy actually help us validate if the model has truly estimated the underlying causal relationships?

What we want to validate is if the causal relationships estimated by the model actually hold in the real world. 

At Recast we like to say: if the model can consistently forecast the future in a system without information leakage in the face of exogenous budget changes, then the model likely has picked up the true causal relationships.

TLDR:

Correlation vs causation is a statistical problem that has real-life consequences on how brands measure incrementality and how they allocate their budget efficiently. MMM is based around the idea of causality – not correlation, and that’s why model validation is so important – so we can know if it has actually found and picked up causal inference and it can be trusted or not. 

About The Author