What Data is Needed for Marketing Mix Modeling (MMM)?

MMM is hard enough — but it’s just impossible if you don’t have the right data. 

Getting good, clean, and ideally automated data is critical to run a really good model because, without it, your model will take so much longer to set up and train – and it won’t be able to report accurately or have a single source of truth.

So, what data do you need for it to work?

For a Recast MMM, what we need to ingest is a pretty simple table of historical marketing activity – ideally going back about two and a half years. 

It looks like this:

  • One row per day
  • One column for each marketing channel with:
    • the amount of spend in that channel on that day
    • the business’s KPI on that day.

Make sure you add all your channels to each column: branded search on Google, non brand search on Google, Google shopping, etc. You’ll need the amount of spend daily on each channel.

The business KPI can be revenue by day, profit by day, conversions by day, marketing qualified leads by day, or whatever metric the business is goaling the marketing against. 

Here’s a template we use for our clients – feel free to make a copy and use it yourself:


That is what you need to get started. But there are complications to be aware of: 

What happens if we have channels that daily spend can’t be tracked well?

Podcasts, for example, can be difficult to track spend if you’re working with an agency or even if you’re managing it in-house. You’ll have to go back and see when the drops happened and how much did you spend on them. 

What happens if we had any promotions and discounts?

Historical promotional events are really critical data to start tracking. For example, if you did a 20 percent discount for last year’s back Black Friday, or a 10% discount for this 4th of July – the model needs to ingest that. If you ran any historical lift tests or experiments, you need to track that as well. 

What happens if we have different distribution channels for different KPIs?

You can combine them all into one, which has the benefit of it being very simple to understand and set up. Or, you can split them out and run different models for each of those different distribution channels. 

It depends on what you are optimizing for. 

If you care about knowing, for example, “how much does linear TV spend impact our sales at Amazon?” or “how is that different from how it impacts sales at Walmart?,” then you’re going to need multiple different underlying models. 

But if you just want to focus on what drives the most revenue overall, you can add them all up. 

To be fair, there are other assumptions that go into that. If you believe that time shifts are different between your online DTC business and Target or Walmart, then summing it up might not be a good idea.

But those are different tradeoffs that you can work through depending on what is the relative size of those different distribution channels and exactly what questions do we care about answering as a business. 

A recommendation: use data warehouses. 

We truly recommend that every brand has all its data in a marketing data warehouse. You’re going to need it for reporting and for multiple analyses that you’re going to want to do no matter what. We think it’s a worthwhile investment to get it set up – whether you work with Recast or even if you don’t do MMM.

About The Author