Automated Data Quality Checks at Recast

Garbage in, garbage out. If a media mix model has any chance of working, the model has to be fitted on high-quality, accurate data. 

Keep in mind: marketing mix models are holistic models operating at an aggregate level, not at an individual channel level. What that means is that if any of the data isn’t correct, it puts the whole model in jeopardy.

Bad data from one small marketing channel can cause a ripple effect that can negatively impact all of the other estimates in the model, so it’s critically important to have a process and tools for validating the data quality that’s being used by the model and as part of your business’s incrementality system.

In the context of marketing science in general the data that we’re working with fall into two buckets:

  1. Data on marketing activity: spend or impressions in marketing channels, pricing promotions, etc.
  2. Data on a business KPI: revenue, profit, leads, new subscriptions, etc.

There are a few aspects to data quality to keep in mind:

  • Completeness: do the data fully cover the activity for the time period being discussed?
    • Example: if you are missing data on podcast activity from one quarter last year when you switched media agencies, then your podcast data are not complete.
  • Stability: are the data consistent over time, or do they tend to “shift”?
    • Example: if you are constantly re-calculating what counts as a “marketing qualified lead” and applying that definition back in time, then those data are not stable.
  • Consistency: are the data consistent with other tools, systems, or reports?
    • Example: the finance team shows different blended ROI numbers when compared to the numbers in the marketing data warehouse.

In order to help our customers effectively use Recast to operate an incrementality system, we’ve built a series of automated data quality checks that detect issues with completeness and stability while ensuring that the data in Recast are consistent with other systems.

First, let’s talk about consistency. Recast is a data-warehouse-first incrementality solution, which is to say that Recast does not connect to any external tools other than your data warehouse. Recast doesn’t connect to Meta or Google or Shopify or any other tools. The reason why is because we believe deeply in the importance of consistency. We want Recast’s reports to line up exactly with the numbers in the marketing teams’ and executive teams’ dashboards, so we want to work directly off of the source of truth in the data warehouse.

If Recast were to use different definitions of “meta retargeting” than what’s used internally, then the numbers and reports wouldn’t line up and the data wouldn’t be consistent. Additionally, Recast makes it very easy to compare the data in Recast with other sources so that marketers can quickly validate that the data “match up” with their internal source of truth and ensure that there aren’t any underlying data issues.

For stability and completeness, Recast has built some automated tools that help companies identify and detect data issues before they have a chance to cause issues in the MMM. These automated data quality checks cover a range of common data issues and include the following:

  • Stability: automatically detect if data have changed from one week to the next.
    • Example: It looks like spend in this channel in January 2023 is different this week from last week. Is that a correct update or an issue in your data pipeline?
  • Completeness: check for Null, zero or NA values:
    • Example: Channel x is showing null values from certain time periods. Does that represent missing data or should those be zeros?
  • Completeness: automatically detect if it seems like some data is missing when it shouldn’t be:
    • Example: last week spend in channel X went to zero. Did you turn off this channel or is it a data pipeline issue?

While these checks may seem obvious, they’re actually a huge timesaver for our customers helping them identify issues in their data pipelines that they might not have caught if not for Recast. Additionally, by running these checks before the data make it into the model, Recast avoids wild goose changes that can happen when faulty data generate unreasonable model estimates.

About The Author