Statistical modeling is hard. There are millions of ways for the model to go wrong and only one way for it to go right. How do we know if the model we’re using is appropriate for the question we’re trying to answer? How do we validate it if the ground truth is unknown?
We’ve written about model validation a lot, but this article will focus on what should be your first step if you’re doing any type of statistical inference work – parameter recovery. It’s an incredibly important technique for all types of statistical modeling problems and will help you know where your method works, and where it doesn’t.
The Role of Parameter Recovery in Enhancing Model Accuracy
One of the things that makes marketing mix modeling really challenging is that the ground truth that we care about, the true incrementality of our marketing channels, is unobserved. It’s unknown and unknowable.
So we are trying to build a statistical model where it’s impossible to know if we’ve gotten the correct answer, right? Well, not exactly.
What if we did know what the right answer was? Then we could run our model and compare the model’s results to the truth, and see how well we did. That’d be helpful, right?
Well, that’s what we can do with a parameter recovery exercise. The idea is that we can generate fake data where we, the modeler, know exactly what the truth is, and then we can test our model on the generated dataset and see if it can correctly “find” the truth.
You might be thinking: wait, that’s too easy! If I already know the answer it should be easy to build a model that can find that answer.
Well, sort of. A few points on that:
First, you want to use this parameter recovery exercise as a test of a fully automated system.
I.e., you want to randomly generate new data and then run your fully automated code against those randomly generated data to see if you can find the truth without human intervention.
You want to actually run this parameter recovery exercise hundreds of times to see how well your model does across many different runs so you aren’t just “getting lucky” on one run.
Second, if your model always recovers the parameters you care about, you probably aren’t being creative enough.
Every statistical model must make simplifying assumptions, and so the point of the parameter recovery exercise is to understand under what circumstances your model works well, and under what circumstances it doesn’t.
You should be able to come up with situations where your model can’t correctly recover the parameters — if you aren’t able to do that, you aren’t being creative enough!
Practical Steps for Implementing Parameter Recovery
So, what does this parameter recovery process look like in practice for MMM modelers?
- First, start with research. You should be talking to marketers and marketing scientists, and reviewing the literature to try to understand what the “real world” actually looks like.
- Second, you need to generate simulated data based on the research you did in step one. Remember: the goal is to try to generate data that is as realistic as possible, not just data that you think will be easy to model!
This might be something like:
Facebook has an ROI of 3.9x this month and 2.3x last month, and the time shift is 25 days. Linear TV has an ROI of 3.6x this month and 4.2x last month and the time shift is 14 days. Etc.
- Finally, you should build a system that makes it easy to randomly generate data and then test your modeling approach against that data. You should be able to run this program tens, hundreds, or thousands of times so that you can understand how often your model gets to the “right” answer.
So, that’s the idea. This technique can, and should, be used for any type of inferential statistical modeling that you’re doing.
Applying Parameter Recovery to Marketing Mix Modeling Insights
But, what features should you include in your data-generating code for marketing mix modeling specifically? Here’s a non-exhaustive list of things you should consider:
- Does marketing channel effectiveness change over time? By how much and how fast?
- Can we generate data with time shifts and diminishing marginal return curves that are unknown to the modeler? How does that impact our results?
- What’s the relationship between seasonality and marketing effectiveness? Are they independent or do they interact?
- Are base sales fixed over time or do they evolve over time with seasonality or trend?
- Do promotional events have all of their effects on one day, or are they spread out over time? Are there pull-forward or pull-backward effects?
- Are channels truly independent or are there relationships between channels? Does spend on TV drive branded search activity?
This is just a non-exhaustive list to get you started, but hopefully, this helps think about how you can apply this parameter-recovery technique to your own model.