In the marketing world, there are four main ways to measure marketing effectiveness.
- Tracking, which focuses on telemetry, and has the goal of assigning customers, orders, and even order items, to specific channels that get “credit” for the conversion.
- Experiments, which involves — you guessed it — running an experiment, with control and treatment groups across experimental units, which could be individuals, households, geographies, stores, etc., with the goal of estimating a treatment effect.
- Surveys, which ask customers directly where they heard about a product or service, and track responses over time to establish changes in consumer sentiment.
- Modeling, which uses observational data combined with a statistical model to produce coefficients that parameterize the relationship between channel spend and the relevant business metric (new customers, revenue, etc.)
Each of these approaches have their strengths and weaknesses, and all have their place in a modern measurement system. Below, we take on each of them in turn, and then summarize how we think they should fit together in an ideal organization.
The goal of digital tracking attribution systems is to assign credit for each conversion (I’m using “conversion” as a placeholder for whichever metric is most important, which could be acquiring new customers, sales, etc.) with as much granularity as possible. What makes them different from experimental and regression approaches is that they — at least in principle — are assigned a value for every conversion unit. We care, for example, that customer 1001 is assigned to Facebook, and not Google. With experimental and regression approaches, individual customers do not get their credit assigned to specific channels. Experimental and regression approaches have nothing specific to say about customer 1001; they only estimate systemic relationships between spend in a given channel and the aggregate level of conversions. This is why we sometimes call attribution a “bottom up” approach, contrasted with experimentation and regression, which are “top down”.
In fact, the goal of assigning credit for every conversion is the defining feature of attribution, no matter the telemetry (e.g. UTM codes, vanity URLs, discount codes) or the system for assigning credit (e.g. a simple last click model, or a fancy Hidden Markov Model-based multi-touch approach).
On the plus side, attribution models are typically the easiest to set up, and the easiest for stakeholders to understand. This makes them invaluable for organizations.
However, there are some serious downsides.
- It’s generally accepted that touch-based attribution models over-credit the “bottom of the funnel” channels (the ones a customer is likely to see right before their purchase) and under-credit those at the top. To fix this, many teams end up using un-validateable fudge factors to try to “make the numbers look more reasonable”.
- Using discount codes can lead to material losses in revenue as the codes get distributed through coupon code aggregators like Honey.
- Finally — and this is subtle: attribution models suffer from the problem that correlation does not imply causation. Marketers shouldn’t care about how spend and conversions are correlated, but rather what will happen if they decide to increase their budget. The distinction is important — and many marketers may not be aware that there is a distinction to be made at all.
When the number of channels gets large, the number of different ways to attribute customers to those channels gets larger and more unwieldy, and requires modeling and assumptions to synthesize.
Experiments, since they use randomly assigned treatment and control groups, do estimate causal effects, and not just correlations. It is for this reason that they are the gold standard for estimating true incrementality — when they’re doable, and when you believe the setting in which you ran the experiment generalizes to the ones you care about.
Before I explain that last sentence, let’s back up and quickly go over what experiments (often called “lift tests”) mean in this context. We need to distinguish between “intra-channel” and “inter-channel” tests. “Intra-channel” tests refer to experiments that pick out the best version of an ad, but the interpretation is restricted to that channel itself. It tells you, for example, which is the best mailer to send out, but it doesn’t tell you what would happen if you were to increase your direct mail budget by 20%.
“Inter-channel” tests measure the effect of changing the levels of spend for channels relative to each other. This is more important when it comes to budgeting and planning rather than just optimizing a channel within a fixed constraint..
Let’s say that you want to measure the effect of your TV ads. A reasonable thing to do would be to turn off spend in a randomly-selected set of DMAs, and then see how conversions in those DMAs are affected. The dip in sales in those DMAs could be said to have resulted from the reduction in spend on TV. This is a slight oversimplification, as you’d have to use a statistical approach to estimate the effect, but the approaches are well understood and are relatively straightforward to implement.
This is great! Why don’t we do this all the time, for every channel? Well — unfortunately, while experimental methods provide the best approach in principle, they often don’t fare so well in practice. Here’s why:
- They don’t work for every channel at a time. To estimate the effect of one channel, you’ll want to keep everything else the same — including the spend in other channels. Combined with the point above, this means that getting experimental validation of every channel is prohibitive for any real-world decision-making cycle.
- They require substantial people-power to run. Unlike attribution and regression models, which work with observational data, experiments require an active intervention. DMAs need to be chosen, spend needs to be turned off or on, etc. In practice, this can be a non-trivial amount of work, and many teams don’t have the capacity or expertise to make it happen very frequently.
- The setting you test in may not be the setting you care about. A common scenario we see with seasonally-driven companies is that the stakes are too high to test during their most profitable periods (e.g., between Thanksgiving and Christmas). They may test in June, when their most important weeks are in November and December. Will these results generalize? It’s anybody’s guess.
So this is why we say that experimental methods are best — when they’re feasible, and when you’re confident the results will generalize to when they matter most.
If you want to know where your customers come from, why not just ask them? This is usually the method most companies start with informally, because it’s context-rich (they can tell you why they decided to click) and reliable enough to begin with.
Eventually companies move to formalize this approach to get more consistently recorded results (sales teams are notoriously unmotivated to do data entry). A “How Did You Hear About Us” (HDYHAU) question in the checkout flow or on the thank you for ordering page is a favorite in ecommerce. In B2B it’s one of the questions most consistently asked in webinar signup forms and on ‘book a demo’ or ‘contact us’ pages.
Once companies get to maturity the very top of the funnel – brand awareness and recall – becomes the focus: the bottom of the funnel is well optimized and the business turns a good profit, so the last hurdle remains conquering mindspace. Panel surveys are useful for this, as you can get a representative sample of the population to confirm whether they’ve heard of you, and track that percentage over time to spot trends.
This brings us to the final pillar: regression-based approaches. Like attribution models, they work with observational data, so they require less disruption to your existing business. And like experimental methods, their goal is to estimate causal effects, not just correlations.
Regression approaches take as their input time series of channel-level spend, control variables (like seasonal indicators), and of the relevant conversion metric (say, new customers acquired), and output estimates of the incremental effect of additional dollars spent.
Not only that, but different variations of these models estimate lagged effects, cross channel effects, geographic-level effects, and even more as well!
If this sounds too good to be true, the catch is that these models are enormously complex, require lots of assumptions, have a huge number of parameters to fit, and must do so with limited data. This makes them tricky to estimate, unstable (they give different results when the data is changed only slightly), and difficult to validate.
What’s the right mix?
Many companies are already employing a mix of all three approaches. How should they be integrated in a principled way? While there’s no right answer, we think about it in the following way:
First, we should remember that all this measurement is in service of figuring out how to allocate our marketing budget this week, this month, or this quarter. While, in a vacuum we may wish to get the best numbers possible, in the nitty gritty of the real world, the numbers need to be both actionable and timely.
This unfortunately precludes experimentation from forming the foundation of a marketing effectiveness system, since it is often too slow to fit inside the budgeting cycle.
Second, all three of the remaining approaches (surveys, attribution and regression) have weaknesses: attribution in that it is modeling correlations, not causal effects; people are notoriously unreliable when answering surveys; and regression in that the models are difficult to estimate and validate..
In our view, this suggests having multiple measures that can provide different perspectives. That is, each methodology is a lens through which to view performance, and the best marketers incorporate the lessons they get from each method into their decision-making process.