Posts

003 | The Past, Present and Future of Online Attribution


When it comes to measurement, the marketers goal has always been the same. To quantify the impact of marketing on commercial metrics in order to validate and optimise marketing strategy and channel mix. 


Using data to drive decision making is what first attracted me to digital marketing. I could apply my mathematical background with creative problem solving in order to solve key business challenges. Almost twenty years ago I created a multi-touch cross channel attribution model for First Choice Holidays (part of the TUI Travel Group). Due to the processing power that was required of a standard Windows PC, it had to run on an isolated computer over a Thursday night so that I could analyse the results with a bacon sandwich on a Friday morning. 


Today online attribution faces its biggest challenges yet. Fundamental tools required to make connections between the data are going away to conform with new regulations and a privacy conscious consumer. Before examining the future of digital marketing attribution within this landscape, I wanted to spend some time looking back at the journey that brought us to this point. Today’s challenges may be significant but this past will demonstrate that the industry has been able to meet and overcome some sizable problems to get to this point. It is by leveraging the same creative thinning combined with technological innovation that will enable marketers to continue to use attribution models to make the right decisions for their business. 


It’s almost impossible to construct a fixed timeline that details the full history of attribution. Organisations of varying sizes developed new models and technologies at different times and many of these were released. Therefore the below is told through the lens of my own exposures and experiences with over a 20 year career in digital marketing. I’d love to be challenged, further educated or corrected by anyone with greater expertise in this field. I love to learn. 


---


Sales Attribution
c.2000


With the rise of digital marketing in the early 2000’s, effectiveness would initially be judged by the number of times that an ad was shown compared to the forecast. Advertisers would buy impressions at a certain cost per 1000 (CPM = cost / 1000 impressions) and measure whether or not the publisher could deliver the desired quantity. 


In 2002 the growth of the pay-per-click model enabled clicks to become the dominant metric to understand ad effectiveness. Advertisers now understood how many people had taken an action directly on their ad with the most common result being a visit to their website. These models the metrics clickthrough rate (CTR = clicks / impressions)  and cost per click (CPC = cost / click).  


Analytics organisations such as Omniture and WebTrends led the way in further evolving this model. If a visit to a website could be recorded then additional onsite activity can also be captured, including crucially accessing the sales confirmation page. The ability to connect these ‘converters’ with the marketing touchpoints they had used along the journey was the foundation of online sales attribution. A small data file deployed on a consumer's computer allowed websites to identify them and facilitate the connections between the initial click and the action. 


New metrics became available to online marketers with the tools to make connections between sales and marketing activity with cost per acquisition (CPA = marketing cost / acquisition / action) used by the majority of organisations. Organisations now understood at a high level how many of their sales / actions were driven by their digital marketing activity. 



Multichannel Attribution

c.2005


Recognising that sales and revenue could be directly associated with marketing activity was a powerful tool but challenges soon emerged. As access to information increased, consumers would interact with an increasing number of marketing touchpoints before making a purchase. Allocating the full value of the sale to every single touchpoint led to a biassed view of the digital marketing channel overall. I recall regularly viewing marketing analytics results showing sales recorded through digital marketing were more than double the total sales transacted via the website on the day! 


The last-click attribution model became the solution to these challenges. Essentially the last known click on a tracked digital marketing advert from any publisher would receive the full credit for the sale made. Despite universal recognition of these limitations of this model, something I wrote about back in 2009, it became the industry standard for many years. 


It was these limitations that led me to develop the cross-platform model used at TUI. This model distributed the value of the sale equally across every recorded touchpoint in the users online journey to purchase. The model was also adaptable depending on the business need. To drive the most efficient sales, it could be weighted towards the last click, or to increase new leads towards the first click. Over the years similar rules based models are available in the majority of online analytics or ad platforms. 



Cross-Platform / Cross-Device

c.2010


As the rise of smartphones accelerated during the 2010’s a new problem emerged. Marketing touchpoints increasingly started to occur across multiple devices. These interactions could not be associated back to the same user and as such the path to purchase became fragmented.  This was further complicated as media interactions did not always occur via a traditional browser. A user could click on an advert within the Facebook App but later make a purchase via their desktop browser, yet the two events could not be connected. 


New deterministic methods were required to try and match the user across their devices. Publishers and platforms would increasingly advocate for users to login to their services to allow them to identify users across their widening portfolio of devices. If a user was logged into their accounts on every device used throughout their journey, then conversions could be attributed accurately. 


Probabilistic methods are also used to complement deterministic methods, or where no logged in first party data existed. With this approach, models and algorithms are to match users across their devices using a range of identifiers from mobile devices such as the advertising IDs, IP addresses, browser type or language type. 


These methods helped to provide a robust solution to cross-device attribution. 



Online to Offline 

c.2013


By this time, the majority of organisations were able to recognise how online sales were driven by digital marketing activity at a granular level. But within the majority of verticals, the majority of sales still occurred in physical stores. Research online / purchase offline became the most common route to purchasing products. 


Organisations such as Billiups started to leverage data from users who had opted in to location tracking on their smartphones in order to identify movement near physical stores. The smartphone had turned from a tool that complicated online attribution to one that could help connect digital activity to offline actions. In 2014 Google introduced the in-store visits metric into Adwords to provide marketers with metrics such as cost per visit (CPV = visit to a store / cost). 


Organisations looking to go further could also understand the impact on instore sales. Propositions emerged that combined publisher clickstream data with offline sales information leveraging user email addresses as the primary key. It was now possible to connect digital marketing activity with total sales regardless of channel. 



Data-Driven Attribution (DDA)

c. 2015 


Recognising the role of digital marketing activity in driving both online and offline sales was incredibly powerful for organisations at a high level. However the models used by organisations to attribute value across marketing touchpoints had not evolved at the same pace. Although advertisers understood the limitations of rules based attribution models, the outputs of these models had become embedded within their internal reporting frameworks. “We know it’s not ideal but it’s working for us” was a common pushback from advertisers that had driven online growth through paid media. 


In the early 2010’s adtech platforms started to BETA test new models that used historical data to analyse the significance of digital touchpoints in generating a conversion and attribute accordingly. These sophisticated algorithms are able to consider factors such as the sequence of interactions, the type of interaction (view, click, impression) and the time between interactions and then assign credit to each touchpoint based on its role in driving the conversion. 


Today the majority of advertisers will have migrated to a data-driven attribution model, particularly for their biddable media purchasing. A key strength of DDA is that it can work in almost real-time and therefore can power bidding and buying through dynamic platforms.


---


THE BIGGEST CHALLENGE YET



Third-party cookies have served as the key enabler for attribution within digital media. Used to connect different marketing touchpoints with an eventual sale, they also enabled personalisation, retargeting and help websites remember preferences. However cookies also have the capability of capturing more user data and feeding this back into platforms and publishers. Their uncontrolled usage resulting in extensive tracking and profiling of users was the key factor in the development of reformed data privacy regulations between 2016-2018. These regulations modernised existing data protection directives for the digital age, providing greater user rights and increased accountability for organisations using consumer data. Directives such as GDPR (Europe) set a high bar for data protection internationally. 


The movement to a more privacy focused digital world was not just led by regulation. Organisations, evaluated their own positions on responsible use of online data and their own set of principles and ideologies. It then became incumbent on organisations to develop technology solutions to ensure that these principles were upheld. Apple made one of the first moves introducing Intelligent Tracking Protection (ITP) to it’s Safari browser in 2017. This prevented the ability to track individuals by blocking third party cookies, fingerprinting and more. By the end of 2020, all iOS browsers were required to have ITP in place. By this point the mobile device had become the dominant product for accessing internet services, either via a browser or via an app. In 2020 Apple released IOS 14 with the inclusion of App Tracking Transparency (ATT). With this technology,  every single app was required to prompt a user for permission in order to track their activity. Although increasing every year, only 34% of consumers opt in for this tracking. 


In January of 2020 Google also announced that Chrome would phase out support for third-party cookies and after a few extensions set the deadline for the removal as the end of 2024. Chrome is the leading internet browser in the world with a global market share of 64% (statcounter) so the ramifications are significant for publishers, website owners and marketers. 


---


THE FUTURE 

c.2025….


The impact of these changes has ramifications throughout the online world. Advertisers will face reduced targeting accuracy leading to increased CPA’s, they will not be able to retarget consumers to encourage them to take an action and they will no longer be able to attribute actions across marketing touchpoints. Publishers may see lower revenues with far fewer signals to develop personalised ads whilst technology platforms need to invest in developing or adopting new solutions. 


As a result, the industry as a whole is working hard to develop privacy centric solutions that allow the recording of online actions and for these actions to be attributed back to marketing activity. Here is a non exhaustive summary of potential solutions that are either being proposed or prototyped. 


Probabilistic Attribution

In the short term, the new limitations on data collection will move the industry further towards probabilistic approaches. Due to the reduced volume of observable data available, data-driven attribution models driven by machine learning will need to become increasingly predictive when it comes to attributing the value of actions across marketing touchpoints. 


Authenticated First Party Data

Solutions such as Enhanced Conversions from Google look to supplement existing conversion tags by sending hashed first-party conversion data (EG email address) from a website to Google in a privacy safe way. This data can then be matched with Google data in order to record a conversion. This solution relies on the user being signed into their Google account when taking the action, with attribution only possible against Google properties. 


Identity Graph Solutions

Identity graphs also use hashed user data. An identity graph is essentially a digital map that connects data points about individuals to create a unified view of the user while ensuring anonymity and adhering to privacy regulations. Sophisticated algorithms match and link data points that can also be blended with demographic data with customers segmented based on shared characteristics. 


Server to Server (S2S)

Some measurement experts talk about server-to-server (S2S) as a core method of attribution moving forwards. This method involves passing unique identifiers directly between advertisers, publishers and attribution platforms. These unique identifiers do not contain any personal information and data remains secure without the need for cookies. 


Clean Rooms

Another option could be ‘clean rooms’, secure and controlled environments where multiple parties can collaborate and analyse data. These rooms do not contain any user specific data so any attributed outputs will always be at an aggregate level. These solutions can be technically challenging to set up creating a barrier for the majority of advertisers right now. However they are privacy safe 


Browser Based Attribution 

Some of the latest proposals to maintain online attribution involve leveraging the user's browser to record the marketing interaction (source event) and then match this with a conversion. This data alone is later transmitted back to the adtech company and in some cases publishers. To ensure user identity can remain private, some additional noise is added to the reported data. This means that this solution may not be viable in cost per conversion scenarios or when required to make real-time bidding decisions. 


---

EVERPRESENT MEASUREMENT SOLUTIONS 


Some marketing measurement solutions have been used for decades but become more sophisticated and intuitive thanks to advances in digital technology. Limitations on available online data will still impact these models, albeit to a lesser degree. As such these methods will play an increasingly critical role in the marketing measurement toolkit moving forwards. 


Econometric / Media Mix Modelling

Econometrics is an established means of testing media activity be it online or offline. Historically econometrics required significant data collection. Granular marketing, website, offline and even external factors like weather and economic trends are fed into advanced statistical models that output insights into channel effectiveness, ROI and attribution. 


Media mix modelling (MMM) tends to be a simpler approach that focuses specifically on marketing channels and predefined metrics as outputs. Data is fed into a statistical model (typically a multivariate regression model) with a view to understanding the impact of the marketing channels on sales. 


The great strength of econometric modelling is that it is one of the few techniques to incorporate offline marketing activity such as TV, outdoor and print into the model. As such they are often the default source of truth for agencies and brands who invest across these channels. From an online perspective a weakness of econometrics can be a limited recognition of the nuances and granularity of online data. I remember a consultancy advising me to increase investment in branded search following the results of their econometric study across our channels.  


While contemporary advertising leverages a wealth of data streams feeding directly into econometric models, studies based on this data rarely translate into frequent and timely insights essential for dynamic bid optimisation, a cornerstone of effective online advertising. This disconnect between readily available data and actionable knowledge hinders advertisers from fully capitalising on its potential.


Incrementality Studies

Incrementality studies are used to understand the significance of a media interaction in driving a conversion. The key question being whether the media actually drove the conversion or if it would have occurred naturally. 


Incrementality studies use a test and control methodology often using location to divide users into groups. I was involved in one of the first geo-experiments in the UK and the biggest challenge was creating these groups to ensure the experiment was as fair as possible. Today, the geo-X framework is well established and far simpler to deploy with established granular splits at a geo and a user level built into many advertising platforms.  


There remain some drawbacks. In order to achieve statistically significant results and consequently the most valuable outputs, advertisers are encouraged to ensure the experiment is of a certain size and runs for a period of time required in order to capture this data. This means that either the test or control group is exposed to theoretical conditions. For example, turning off marketing to users in the test group for a period of time, could well lead to a drop in sales. As such incrementality tests are most powerful when proof is required to support a hypothesis that is already predicted to be true. 


---


CONCLUSIONS


At one point in my career I was dreaming of a utopian measurement solution where every marketing and sales touchpoint could be tracked and sales and conversion data recorded appropriately with automated bidding and buying working in tandem. I always looked to evolve models used to ensure that new marketing touchpoints could be considered, in particular championing the inclusion of impressions and views into models previously focused purely on clicks. 


It is clear today that this expectation looks unlikely to materialise. If consumers do not want organisations to know their marketing interactions leading to a purchase, then this preference should be respected but an attribution model will never be complete. 


Consent has become the bedrock of privacy-compliant data collection which serves as the enabler for digital online attribution. The online industry collectively will need to work harder to communicate the benefits of consent to consumers. In my personal view, the web is a far poorer experience without data driven advertising. Without data driven placements, publishers are forced to find new means of recouping lost advertising revenues by increasing the volume and intrusive nature of adverts in a quest to obtain the all important click. 


Greater consent generates more first-party data which can be fed into probabilistic data driven models. The more observable data that can be fed into these models, the more accurate they will become. Today, the majority of data driven models are platform specific, due to the ability to leverage first party data platform data. However I expect more platform agnostic solutions to materialise in the coming years. 


2024 promises to be a pivotal year for online attribution as organisations look to adapt to a world without third party cookies and make the necessary adjustments to their online strategy. I believe it will be the organisations that are able to rapidly develop a first party data strategy and combine this with the latest machine learning based models that will reap the rewards.


002 | ATP 2023 Summary | A Tableau Dashboard

Tableau is an amazing tool for data visualisation. It is intuitive in it's understanding of different data sets, contains a number of different templates that can be customised and incorporated into dashboards and Tableau Public is actually available to anyone, for free. 

To show off some Tableau's capability, I wanted to identify a dataset that was structured but disparate, easily obtainable but most importantly would be of interest to a relevant audience. 

---

OBJECTIVE

The main objective was to demonstrate how the Top10 Male tennis players in the world had earned their ATP Ranking points throughout the year. I also had a few smaller hypotheses I wanted to test: 

Hypothesis 1. Novak Djokovic had earned his deserved No 1. ranking through performance in the Grand Slams vs other qualifying ATP tournaments.

Hypothesis 2. Unlike in 2022 there had been no breakout stars accelerating through the rankings in 2023 and entering the Top10. 

---

DATA PREPARATION 

I obtained data about each players performance directly from the ATP Website and consolidated this into a spreadsheet. I then cleaned the data by ensuring that there were no duplicates, evaluating any outliers and removing any irrelevant data. 

1. Not every player competes in every event. This led to null values when transforming data using pivot tables and created challenges for generating some of the calculations. 

2. Some events occur simultaneously.  This led to unexpected duplication of the date field. To solve for this I wrote a formula to identify duplicate dates and concatenate tournaments based on having the same date. 

3. Date field recorded only as text. A common problem when importing data into Google Sheets is that it can only recognise dates as text and even resists when you try and convert the format. To overcome this, I use the a combination of left(), mid() and right() formula to rebuild the data in the 

=right(F13,2)&"-"&mid(F13,6,2)&"-"&left(F13,4)

---

ADDING CALCULATIONS

The key calculation required from the data was the cumulative ATP Points throughout the year. With all the data in one table, adding this data required a small formula to add player points after each tournament (Column G) but importantly reset for each player (Column A) 

=If(A2=A3,F3+G2,G2)

---

THE TABLEAU DASHBOARD

Tableau has simple connections to multiple different file types meaning importing the data was relatively straighforward. I created a variety of different visuals across different sheets and merged them into the dashboard here.  A slightly squashed version can be seen below: 

---

CONCLUSIONS

Hypothesis 1: Djokovic earned the majority of this ATP Points through performance in the grand slams. 

Djokovic scored 72% (7200) of his overall points in the year in the Grand Slams. In what was an almost perfect year, the Serb won the Australian Open, US Open, French Open and was edged in 5 sets by Alcaraz in the final at Wimbledon. Beyond the Grand Slams he entered 8 other tournaments, almost half the average of the other players. However, his performance at slams and the ATP Finals alone would not quite have been enough to finish the year in no.1 position as Alcaraz's 8,845 points picked up across 16 tournaments would have trumped the 8,500 Djokovic scored from these events. 

Hypothesis 2: Limited movement within the Top 10 Players in the world throughout 2023. 

Whereas 2022 saw the emergence of young starts Carlos Alcaraz, rising from 32 to no.1 in the world and Holger Rune from 111 to 11, 2023 saw a much more stable Top10. The biggest riser throughout the year was perhaps the player who had the strongest end to the year winning the Davis Cup for Italy. Jannik Sinner began 2023 ranked 15 in the world and finished ranked no.4. Lost to the Top10 were Rafael Nadal through injury and Casper Ruud and Felix Auger-Aliassime who struggled for form in the latter part of the year. 

---

KEY LEARNINGS

This was the second Tableau dashboard I have ever built and quite a bit of trial and error was required to reach the eventual output. As well as the conclusions above, I wanted to document some of the key learnings so that I and others may avoid such mistakes in the future. 

1. Tableau does not always play nicely with crosstab data. There are some easy solutions available to solve this but I now know not to invest extra time getting data into a crosstab format for Tableau. 

2. Refreshing base data does not always work. If I added data to my Google sheets and refreshed in Tableau it did not always recognise new data. I found it easier to create a new sheet, import it and link them. 

3. Dashboards do not resize well for different screen sizes. I understand that this is particularly the case when you are using floating elements vs tiled. I have learnt that it's best to design for a smaller screen so that it can be upscaled. 

001


A BRIEF INTRO

I worked for Google for 12.5 years as a Senior Industry Manager for Technology / Retail / Electronics. Throughout this time I managed relationships with over 50 organisations from small startups (who invested multiple millions pa in Google Advertising) so some of the largest multinational organisations in the world. 

You can read more about my career to date on my profile page.  Sadly my role was made redundant earlier this year (2023) and as such I have been on gardening leave throughout the summer. 


A NEW CHAPTER

I was able to use this time to take a step back and reflect on the elements I enjoyed the most about my previous roles. I settled on three core things: 

-----

1. Finding interesting insights from data and building a compelling narratives to drive business impact. 

2. Helping other people maximise their own potential by removing barriers and providing clarity and focus. 

3. Feeling like I'm playing some role in helping to make the world a better place.

-----

As a result, I am embarking on a new journey into the world of Data Analytics and Science. The amount of data in the world is growing exponentially. I believe that being able to interpret this data and translate it into meaningful insights with actionable outputs will become and increasingly valuable skillset. 

I have already completed the Google Data Analytics Professional Certificate where I gained a good understanding of the data cycle and learnt how to use Big Query and Tableau as well as programming languages SQL and R. 

I am now working my way through the excellent Data Science Infinity course created by Andrew Jones to go deeper into data science by learning about Machine Learning, Statistical Theory and Python. 


WRITING A BLOG IN 2023

When I first started a blog (in 2007), I was looking for an outlet for my thoughts on Digital Marketing mixed in with some of my personal interests in technology, consumer electronics and some comedy gold. This time I plan to document my journey into this exciting new world and also showcase some of outputs along the way.  

I welcome any engagement and feedback. 


Presenting Passion

The amount of training available in my new job is absolutely amazing but almost overwhelming. An early piece of advice I received was not to try and take in everything but to focus on the areas that will really make a difference in my job role and leave the cookery courses and sleep training to another day!

The blog has been quiet recently as there are naturally certain restrictions on what I am able to discuss in the public domain given my new position. I have to undertake a communications training course in order to be able to speak about matters relevant to online marketing! 

Yesterday however I attended a training course about creating and delivering presentations to large audiences. There are a number of presentation courses available in all organisations but I enjoyed this one as it focussed on a very simple yet effective methodology which I can easily align myself with as it underpins some of my own views.

I am a really keen presenter and continually look for opportunities to talk at conferences and lectures. I picked up some great tips yesterday, the biggest being to requirement to challenge oneself when presenting. Only by taking risks will we become better presenters. 

I was however reminded of this excellent and notorious sketch by comedian Don McMillan on the pitfalls of presenting using powerpoint. Until yesterday, this has always been the biggest influence on my presentation style.


Out in Slough Tonight!

Hoping to go to a "Hooch for a Pound, Wonderbras get in Free" night!