MMM is hard – we totally get that, but there are common mistakes made by consultants in the MMM space that need addressing.
Here’s 10 of them:
1 – Focusing on the in-sample R-squared metric for MMM models.
The standard definition of R-squared is “the amount of variation in the dependent variable explained by the model”. Many people think that a higher R-squared metric means the model is better.
However, this is false for several reasons.
For one, most people are only looking at their model’s in-sample R-squared. In-sample R-squared as a metric promotes overfitting since it’s trivially easy to boost R-squared just by adding more variables. In an MMM model, you can drive up R-squared by adding meaningless dummy variables on the day where your model has the biggest miss – this actually makes for a worse model, since it will make it harder to predict what will happen in the future.
In our opinion, the right metric to focus on when evaluating an MMM is out-of-sample holdout predictive fit.
At Recast, we run the model twice with every update: once with the complete data set and once with data only up to 30 days ago. We use the latter to validate the model’s forecast accuracy.
Consistently accurate predictions in the face of marketing budget changes give confidence that the model has picked up the underlying signal and will continue to make accurate forecasts in the future.
2 – Making assumptions that don’t line up with what anyone believes about how marketing actually works.
The vast majority of models assume that marketing performance is constant over time. They’ll look at two or three years of historical data and the results will say that Facebook has an ROI of, for example, 4.3x over that entire time period.
But no marketer believes that there has been one constant performance level for Facebook over the last 2-3 years. There have been so many things that have fundamentally changed – this assumption just doesn’t make any sense.
A lot of these consultants are using fairly basic tools and making very strong assumptions to make those tools work. But those assumptions unfortunately don’t match reality and this leads to inaccurate results.
3 – Another problem we see is how they handle seasonality.
Some consultants will try to estimate the seasonality, subtract it from the data, and fit the model on what’s left over – which doesn’t make sense because marketing performance and seasonality are actually intertwined.
If our marketing is more effective in the summer than it is in the winter, pulling out seasonality is actually a huge modeling mistake. The model assumes that there’s no relationship between marketing performance and seasonality, but that’s just backwards and every marketer knows that’s not true.
4 – Using automated variable selection to account for multicollinearity.
Appropriate handling of multicollinearity is one reason why the Bayesian framework is more powerful than traditional ones.
One of the biggest MMM challenges is that a lot of variables are correlated because marketers tend to move spend in concert.
When things go well, we increase spend on all our channels simultaneously. And when things don’t go well, we pull back on all of them too.
That creates multicollinearity – especially on channels that are very tightly coupled like Facebook prospecting and Facebook retargeting.
When you have this type of correlation structure, you’re violating the assumptions that are required to get unbiased inferences out of an ordinary least squares regression. If you drop those variables into a standard regression equation, you will get back bad and biased results.
In the example of two highly correlated variables in an OLS regression, the results often show that one of those variables will be very positive and very effective, and one of them will be negative. It would be like saying that, if you spent a dollar on this channel, you will get back a negative return for that – that’s a really big problem.
At Recast we have a fully generative Bayesian model, and when it sees multicollinearity, it will show uncertainty about the trade-off in performance between those two different channels. We think this is RIGHT because, in the case of multicollinearity, there’s truly less signal in the data for those channels. We want the model to correctly reflect that uncertainty.
If Facebook prospecting and Facebook retargeting are highly correlated, the Recast model will not tell us that one is positive and one is not. Instead, what the model says is, “We’re uncertain about which of these two is driving the effect. If you care about being able to differentiate between those two channels, you should try to inject more spend into one and not the other so that we have more variation to exploit. We will then learn which one is doing what.”
We think that’s a much more realistic approach and the right way of thinking about multicollinearity.
5 – Assuming that promotions and holidays are independent of marketing performance, rather than directly impacted by it.
The way that a lot of MMM modelers handle holidays and promotional events is wrong.
Let’s talk about Black Friday:
For a lot of retail businesses, they have a huge spike around BF and Christmas Day that accounts for a large portion of their sales for the whole year. I’ve seen modelers include a dummy variable on that day which creates the assumption that the impact of Black Friday is totally independent of marketing spend.
But that’s not correct.
In the real world, marketers tend to increase marketing spend leading up to big promotional days to capture all of the demand for people who are planning on buying on Black Friday. If you control for holidays via a dummy variable, you are risking dramatically under-crediting the impact of marketing activity.
Also, there are promotional events that actually might have a net negative effect once you account for the pull forward and pull backward effects.
Let’s say your brand does a 20% off every year for Black Friday. Consumers are going to get trained to know that a discount is coming, so they might hold off from buying weeks before the BF sale. Or people who were going to buy anyway will pull forward their purchase timing to take advantage of the discount.
Sure, you might see a spike in sales on Black Friday…
…but all you might have actually done is robbed yourself and given away a bunch of margin with your discounts.
Your MMM needs to take account of these effects or you risk over-crediting that sale.
6 – Using hard-coded long-time-shift variables to account for “brand effects” that aren’t actually based in reality.
This can lead to very badly misspecified models and really bad results:
One thing I’ve seen people do is apply transformations to their data before running it through a regression or their own MMM model.
They might say something like: “We’re going to assume that our display channels and our paid social will have short effects – within eight days. Mid-funnel channels like radio and print, we’re going to assume they have an effect of 30 days. And our top-of-funnel channels like TV, we’re going to assume an effect of 90 days.”
This might seem reasonable at first glance, but it actually leads to very badly misspecified models. They’ve built into their model the assumption that TV is going to be the most impactful, radio is going to be in the middle, and that digital channels are going to be the least impactful.
They’ll run the analysis and say the model said exactly what they expected. But that’s not based on reality! In many cases with this approach, there’s no way for the model to say that TV isn’t very impactful because they’ve spread the effect for so long.
They’ve just baked their assumptions into the model. It’s a really bad modeling practice that no one should do, and that I believe comes from misaligned incentives between agencies and clients.
7 – Allowing the analyst/modeler to make too many decisions that influence the final results.
Many MMM models are flawed because the analysts have too much power.
The process of model building in a traditional approach to MMM often works like this:
You have an analyst, they get the data, clean it, and then they run different models. They might include some variables in one model and exclude them in others, and then apply ad-stocking or timeshift transformation to some variables or make other transformations or assumptions
All of those assumptions have a huge impact on the final results. The R-square metric could be really high in all of them, but they all yield very different results.
In the end, the analyst is going to choose the model that has the results that they like… and that they think will make sense to the client.
There are a couple of problems with that:
1/ The analyst has a lot of power to bias the results. They are, consciously or unconsciously, injecting a lot of bias into that model-building process.
Or say the modeler works for an advertising agency where they are getting paid based on the results of this model. They are very incentivized to make sure that certain channels look good.
2/ The model’s results will not express the true uncertainty from the model-building process. They threw out a bunch of models that had results that we didn’t like.
If they say: “TV has an ROI of 3.7X with a p-value of less than 0.03,” that’s not really reflective of the true amount of uncertainty. They actually looked at 100 other models that had TV performance ranging from 0.5X to 8.7X.
It’s a dangerous thing when you have an analyst who can do data mining in order to get the results that they or the client wants to see. It’s something that we really advise against here at Recast.
8 – Assuming channels like branded search and affiliates are independent of other marketing activities rather than driven by it.
You can’t model branded search and affiliates in an MMM by treating them just like any other marketing channel.
I’ve seen it – modelers will plug their data directly into a regression and what will come back is that those channels are incredibly effective. Looks great, but that’s based on bad modeling assumptions.
The problem is that those channels are very tightly correlated with the amount of revenue or the number of conversions that are being driven. Branded search clicks generally happen at the very bottom of the funnel for someone making a purchase. And so, if you have a lot of branded search clicks, it probably indicates that you also have a lot of revenue but it’s not the case that those branded search clicks are causing the purchase.
Affiliates are paid after the conversion happens so any affiliate spend that is going out the door is very tightly related to your revenue since there is only spend when there is a conversion.
If you just include those channels in your model, they’re going to soak up way too much credit because they’re not truly driving incremental conversions – or, at least, they might not be. They aren’t actually operating at the same “level” of the funnel as your other marketing channels, so your model shouldn’t just assume that they are. Instead, you want to have a model that understands the way those channels actually work and how they’re different from other types of paid media channels.
The more you spend on top-of-funnel channels, the more branded search activity that you’re going to drive and affiliate channels often work similarly.
It’s very important to actually take into account the true causal structure as opposed to just dropping all of those columns into a linear regression and getting back very wrong results.
9 – Only updating infrequently to avoid accountability.
Don’t let your MMM consultants hide behind infrequent refreshes.
Unfortunately, I’ve seen this pattern many times. Consultants will work with brands to do an MMM project, they’ll get a snapshot of data and spend three months model building, and then they will make a presentation. But…
…when they make the presentation, the model’s results are already three months out of date. So, when the marketers then go and actually take action off of that data, they don’t get the results that were specified in the MMM.
What CMOs have told us is that the consultants will come back in and say that other stuff has changed: the economy, what your other competitors have done… So, sorry it didn’t work out, but it’s because the data is out of date. And that’s really frustrating from a marketer’s perspective. You’re paying for this advice, you follow it, it doesn’t come true… and there’s always a bunch of excuses for it.
At Recast, we think that frequent model refreshes actually generate accountability because the modeler can’t rely on the excuse that “the data is outdated and things have changed.” That’s why we spend a lot of time thinking about how we can make the model’s results really actionable, and how can our customers hold the model accountable for the predictions that it’s making.
No excuses – the model needs to be verifiable.
10 – Forcing the model to show results that stakeholders want to hear instead of what they need to hear.
Recast is not a “cover-your-ass” tool.
At Recast, we put the truth first because that’s what we think is best for our customers’ businesses long term. Even if that means telling uncomfortable truths in the short term.
Some MMM consultants design their offering around making the marketing team look good no matter what. Or helping a CMO justify a big investment into channels like TV since that’s very prestigious even if it’s not what’s best for the business.
The truth is that with a complex statistical model, you can make the results say anything that you want if you try hard enough. Unfortunately, that doesn’t actually help to drive the business forward.
We’ve worked very hard to make our incentives align with our customers’ whole-company incentives. We have structures in place that prevent analysts from messing around with the model just to get a certain result.
This is one reason why we put a lot of emphasis on hold-out predictive accuracy, in large part this prevents us from doing those sorts of shenanigans. If we “make” the model say TV is really effective but it’s actually not, we’re going to miss on our holdout forecasts consistently and the customer will be able to see that.
An MMM shouldn’t exist just to tell the story that makes people look good, but to deliver results that can actually drive the business forward.
And, by the way, you can’t really hide things for long. In the long run, it hurts those consultants (because they lose trust)and it hurts the industry as a whole as people lose faith in MMM.
Final Thoughts:
We didn’t write this list to attack others in the industry. Recast isn’t perfect, either. We still have improvements to make in our models and we will continue to find new ones in the future. But these are mistakes that can lead to millions of dollars of wasted marketing spend that we hope brands can now be mindful of.