Cleaning URLs for Accurate Channel Attribution

Anyone who’s ever run an online marketing campaign has encountered URL parameters. These are an additional section you add at the end of the URL, usually to track the marketing source of people who click on your ads, emails, or other marketing campaigns. Because over 60% of the internet uses Google Analytics, these are normally UTM parameters, as in the example below:

https://www.example.com?utm_source=facebook&utm_medium=cpc&utm_campaign=lookalike

In this example each UTM parameter gets passed along to Google Analytics (or whatever analytics platform you’re using) and is recorded in different fields you can use to filter for specific campaigns and see how they’re performing. So for example:

  • `utm_source=facebook` tells us this visitor came from Facebook
  • `utm_medium=cpc means` they came from a paid ad (cpc stands for ‘cost per click’)
  • `utm_campaign=lookalike` is referring to the specific campaign they clicked on

You can also have `utm_content` and `utm_term`, as well as parameters used by other platforms such as `fbclid` (Facebook click ID) and `gclid` (Google click ID), and any custom parameters you add for your own purposes. This system works ok for most marketers because it’s easily understandable by management. We can see how many people clicked on an ad, and then went on to purchase, so we know how well each campaign performed. Hard to argue against. Right? Wrong.

The hidden flaw in UTM parameter tracking

There’s a significant flaw in this system (as is true of all attribution methods), and it’s hiding in plain sight. Most marketers simply have never thought about it, or perhaps they don’t want to think about it, because as Upton Sinclair lamented, “It’s difficult to get a man to understand something when his salary depends on not understanding it”. The problem occurs when people share those URLs.

Let me demonstrate a common example. You see the latest Marketing Measurement Roundup Newsletter from Recast pop up in your inbox, and you think “this post looks great, let me take a coffee break and read it”.

You click through on the ‘Read Article’ link and read the blog post. It’s so good that you decide to share it with a colleague. So you go to the browser address bar and copy and paste the link, which looks like this:

https://getrecast.com/google-analytics-vs-surveys/?utm_source=newsletter&utm_medium=email&utm_term=2022-09-07&utm_campaign=The+marketing+measurement+roundup+from+Recast

Ok no big deal, these are the UTM parameters we talked about earlier. It just tells Google Analytics that I came from the email newsletter. That way the fine people at Recast can tell how well the newsletter is doing at driving traffic. If you don’t see the problem yet, it’s this: when you share it, what UTM parameters is your colleague going to come in on? That’s right, they’ll also be counted as email. See the problem? Your colleague who actually came from word of mouth, and should be counted as ‘direct’ traffic, is actually now treated as having come from the newsletter.

It’s a bigger problem than you think. Marketing campaigns all have their own tracking parameters, and the vast majority of people copy the full URL (nerds like me that strip the parameters before sharing are rare), so your marketing campaigns are getting more credit than they should every time this happens. Some users actually bookmark your posts containing these URLs, and visit or share hundreds of times over weeks, months, or years, each time adding to the tally of a newsletter long forgotten. Maybe you could argue that’s fair: afterall the email did drive me, and then I drove my colleague to the site, so doesn’t the email deserve all that credit? Maybe partially, but should it really be 100%? 

Email is getting too much credit

Consider the following thought experiment. Say one blog post was particularly well written. So well written in fact, that lots of people share it. It goes viral. Now you’re in the enviable position of having thousands of people coming to your blog post, but they’re all labeled email. Now you have to make a budget decision. Do you give a promotion to the person running your email newsletter or the one writing your blog posts? Well if you weren’t aware of this phenomenon, you might erroneously promote the email person. They might claim the post did well because they worked really hard on the email subject line, and you wouldn’t have any evidence to counter that logic. So the person actually creating the value doesn’t get the credit.

Advertising ROI is worse than you thought

Consider a second thought experiment, this time with advertising. You’re spending $100,000 a month driving tens of thousands of clicks to your website. One of those thousands of people copies the URL and shares it with their friend via WhatsApp. That friend just happens to be Kim Kardashian. Your product goes viral, getting millions of visitors, and *as measured by UTM parameters*, your advertising ROI goes from barely breakeven, to 100x return on investment. How much credit does your advertising agency deserve for that? Do you pay out their annual bonus early because they already hit their target? Sure they deserve some credit for getting your product in front of an influencer, maybe even that was an explicit strategy, but probably not all of the credit for millions of dollars of sales.

How to fix this: a URL cleaner script

Don’t worry, there’s a way to fix this. The way to do it is to clean your URLs of parameters after the visitor lands on your website. You can write some custom JavaScript that scrubs all the tracking codes from a URL after your analytics has loaded and recorded those values, but before the user has had a chance to copy and share the URL. An example snippet can be found below, modified from one created by Website Advantage.

<script>
function utmRemover() {
	var cleanSearch = window.location.search
		.replace(/utm_[^&]+&?/g, '') // removes utm_xxx parameters
		.replace(/&$/, '')  // removes & if last character
		.replace(/^\?$/, '')  // removes ? if only remaining character
		;

	window.history.replaceState({}, '', window.location.pathname + cleanSearch);
};
    setTimeout(utmRemover, 2000); // remove after 2 seconds
</script>

This cleans every URL after 2 seconds, which should have given analytics enough time to run.  However you can be smarter with when this is triggered, for example checking when Google Analytics (or whatever analytics platform you use) has run before you trigger the `utmRemover()` script. You could also create some custom logic to further track and investigate these visitors, for example intercepting the parameters before they’re recorded by analytics, and rewriting them to indicate their unique status, for example `email-sharing` rather than just `email`. All this can be done with custom JavaScript, but I recommend making it more transparent to non-technical people on the team by using Google Tag Manager Tag sequencing.

Implementing this utmRemover script or something similar, in my experience, has cleaned up about 10-20% of the traffic I saw that was previously attributed to other sources. This makes sense because word of mouth still drives about a fifth of all purchases according to consumer surveys, and likely influences a significant amount more than that. Much of that traffic comes in direct, or is attributed to organic search via people searching for your brand term, but some of it is being misattributed in this way to your email, advertising, and other marketing campaigns.

About The Author