This is an 8-part series
Click below to navigate to the next chapter:
Chapter 1 The state of web analytics in 2021
Chapter 2 Privacy updates, ad blockers, and the need for first-party tracking
Chapter 3 Building a web analytics stack: packaged vs modular
Chapter 4 The best-in-class tools for web analytics
Chapter 5 Redefining web analytics metrics
Chapter 6 Data modeling for web analytics
Chapter 7 Snowplow for web analytics
Chapter 8 How Welcome to the Jungle took ownership of their web data with Snowplow
Download the full eBook Rethinking modern web analytics
Web analytics has evolved a lot over the last 25 or so years. From humble beginnings of Webtrends and analyzing raw server log files to understand what pages were being requested from your server the most, web analytics now helps businesses understand what is going on on their web apps and platforms in greater detail than ever before.
But how did we get where we are today? Technology has changed massively in that time, as has the way people use the web, and it’s worth taking a look back to see how far we’ve come, and to understand why things have evolved the way they have.
Web in its infancy
When people did browse the web, there were also physically fewer things that they could do online. Ecommerce was still in its infancy, and the range of items and products you could buy on them was limited (Amazon.com had only just branched out from just selling books). There were some social media platforms (MySpace, Facebook est. 2004), but without the proliferation of smartphones, the impact on users’ lives was small. The idea that people went to the web to waste time wasn’t really commonplace. It was much more likely users would go to their computer “to surf the internet” (like it was an activity or event), rather than always having instant access to the internet.
Content consumption on the internet was almost exclusively text and image based, primarily text. Connection speeds were nowhere near fast enough to allow for reliable on demand music or video playback.
Not only was it different from a user’s perspective, it was also different for brands and businesses. A number of brands didn’t have any online presence at all (certainly not on social media). Even if you did, your options for online marketing and advertising were limited to static banner ads (paid for through reserving a spot on a website for a finite period of time), email newsletters and paid search advertising through Google (who had at least by this time established themselves as the number one search engine).
In this more limited web era, web analytics could quite easily cater to this kind of user behaviour. This page view -> visit -> visitor model (later known more commonly as page view -> session -> user), through single devices, from single locations, for a singular purpose, to simple and static websites meant that understanding what users were doing on your site was relatively straightforward.
Where the web (analytics) was won
Over time GA had further functionality developed, including event tracking (for tracking interactions such as forms and clicks around the site, for example), ecommerce tracking and integrations with other Google marketing products such as Google AdWords and Google Webmaster Tools.
GA was well-positioned to help analysts answer the questions that made sense at this time, given that user behaviour and web technology was much simpler than today. Coupled with its free price point, Google was able to bring web analytics to the masses.
A web revolution
Over time however, the way people use the web has evolved significantly. The iPhone was released in 2007, changing the way users browsed both on-the-go and at home, as did, to a lesser extent, the iPad in 2010. Net connections became exponentially faster and more reliable (both broadband and mobile networks) which enabled video and music streaming on demand, as well as live streaming.
Web technologies and frameworks (such as React, Angular and Vue) have been developed which enable web applications that were little more than pipedreams in the early 2000s. You can buy more items online today than you ever could before (cars, stocks/shares/options, groceries, ISAs etc) which brings users to the web more frequently. Not only that, we can now manage our finances through online and mobile banking, start relationships with online dating apps and services, track our health with fitness apps etc.
This is the same if you are a business (handling finances through Xero or Quickbooks, customer support through Zendesk, virtual events using BrightTalk etc). We have more reasons to use the internet and the web than ever before.
And since the variety of items people can use or buy online now is so vast, and most people use the web on multiple devices, the customer journey is now more complex than it has ever been. Research Online Purchase Offline (ROPO) is very common nowadays. Starting your user journey on mobile after seeing a dynamically targeted video ad on a social media app on your commute before researching and purchasing on a desktop web browser at home after receiving a triggered marketing email, and other user journeys similar and more complex are commonplace today.
All of these societal and technological changes mean that the websites that analysts are analyzing today look and behave very differently than they did when websites were just static pages and a few buttons and forms.
And yet, the majority of the most popular web analytics tools out there today still use a data model and frameworks as if we’re still analyzing simple websites, with users that only have a single device.
Breaking out of the GA paradigm
Taking Google Analytics as a primary example: GA relies on page views as its key hit type, in order to chop sets of page views by the same cookie ID (not user) into sessions, and then tie these sessions together to the same cookie. Most metrics and dimensions in GA are tied to the concept of a session, such as device type, landing page, channel, conversion rate, bounce rate etc. Therefore GA requires that you send page view hits so that GA can construct sessions.
There are two key problems with this approach.
Firsty, a session in GA is a complex and untidy concept. It combines a timeout window based on when GA receives hits, campaign timeout windows based on how long you wish a marketing or advertising campaign to be applied, cross-domain tracking issues, referral exclusions, changing acquisition sources, midnight etc. Changes to any of these configurable options will impact how a session is defined, and therefore all the metrics and dimensions that are tied to this concept of a session. Since all these metrics can all be so drastically changed by small changes in config, it’s difficult to have confidence in them.
Secondly, consider a web application that doesn’t fit nicely into the page view -> session -> user framework, like twitter.com for instance. A user can visit twitter.com/home, scroll through their timeline, hover over users’ avatars to see a profile card, like and retweet individual tweets, follow or unfollow users all from this single page, that also auto-refreshes your feed for you. This can all be performed across just one, single page view, in a traditional sense since the URL has not reloaded. If Twitter were using GA for their web analytics, without extreme customization they would likely have a high proportion of their “sessions” consisting of only a few page views and high bounce rates. The standard data model enforced by most web analytics tools don’t fit the web of today.
The standardized page view -> session -> user paradigm doesn’t fit a lot of web experiences in 2021. The BMW car customiser, an online learning provider like Udemy or a streaming service like Twitch are all web applications for which the standard web analytics data model makes no sense anymore.
It’s worth noting that there are a number of websites out there that do still fit this model – most publisher and ecommerce websites generally do fit, for instance, since users move from product pages, to search results pages and checkout and confirmation pages, or users read articles. For those businesses out there, the model still fits well. However, a growing number of businesses do not fit this model, and even publishers and ecommerce businesses are starting to change their web experiences away from what we might call a “traditional” website model.
From websites to digital products
There is a valid argument that the web applications described above are more suited to product analytics rather than web analytics and therefore require a different set of tools that cater to a different set of requirements that traditional web analytics tools don’t do. This is true in a number of cases, although more and more of these “product” type applications are appearing in what could be considered “traditional” websites. The other drawback is that specialist product analytics tools often lack the high-level perspective that web analytics tools are excellent at providing. We are seeing product analytics and web analytics tools come closer together, a trend that will probably carry on over the next few years.
Overall, many modern web analytics tools are generally poorly equipped to provide the deep level of detail required to understand user behaviour across complex web user journeys. This is not a revelation or an unpopular opinion. This challenge has been picked up by the largest player in the market, in an attempt to help analysts better answer those questions that traditional web analytics tools struggle with.
Google Analytics 4 is Google’s newest and latest version, and completely changes how Google Analytics works, from the interface to the underlying data model. Instead of the traditional page view -> session -> user framework, GA4 shifts to using the events -> users data model. This is a big change and it is also a change in mindset for web analysts using GA4 who have used Universal Analytics (“old” GA) for a number of years. GA4 also provides the ability to export data to Google BigQuery for no additional fee – for the first time providing mass access to event level data in a SQL data warehouse.
This major change from Google shows they acknowledge the need for a new look at web analytics, and the fact that GA4 is based on Firebase which was a tool for tracking interactions on mobile apps shows how Google sees the two worlds coming closer together.
Beyond GA4 – the future of web analytics
GA4 however does not fix everything. Things such as reliable cross-device attribution, off-site measurement, integrating other channels such as CRM all while respecting your users’ privacy is still a challenge for all online businesses, and GA4 will not fix these issues – nor will it fix issues with poorly constructed metrics.
To summarise, web analytics tools have struggled to keep pace with the changing user behaviors and technological advancements that have happened over the last 10-15 years. Web analytics solutions need to provide businesses and analysts with the ability to customise their tracking and data models to truly fit their web applications that their customers use, to fully understand the behaviour those users are exhibiting. Without this understanding that comes from rich and detailed behavioural data businesses cannot expect to provide the best user experience across all touchpoints.
In the upcoming posts in this series on web analytics, we will cover some of the big topics and challenges that need to be addressed in order to go to the next level and gain the most value from your behavioural web data:
- Privacy, security, ad-blockers etc…
- Challenges related to relying on a packaged solution
- Building out more meaningful web analytics metrics from your behavioural data
- How best to model your behavioural web data