The following all make web analytics hard.
Much of the discussion around Tim’s original post was whether the complexity was the fault of web analytics platforms or not. As should be clear from the above, I believe that a certain amount of complexity is inherent in web analytics. However, Omniture’s SiteCatalyst (around which much of the discussion about Tim’s blog post focused) actually manages to make things worse.
Web data is complicated. And not just for the obvious reasons that there’s a lot of it, and it is generated very quickly:
The key to handling this kind of complexity is to:
Raw web event data is hard to make sense of: we need contextual knowledge to do so. To give two examples:
Let’s take the example of a user on a shopping site who buys a pair of running shoes. There are several things we might infer from the data:
Whether we should make the above inferences from the data, and whether we use those inferences in other analyses down the line, are decisions that can only been taken based on our broader understanding of the business and the ways users engage with it. That kind of contextual knowledge isn’t stored in the data itself. These types of decisions look different depending on the type of entity we are dealing with (customer vs product vs article vs company etc.) and the type of decisions and reasoning we need to perform about that entity.
What is interesting about the above is that these decisions are reasonably straightforward for a data analyst with the data in front of her to make: we all know, for example, that changes in a person’s gender are unusual, and so if the user is a man today, he is likely to still be a man in 6 months time. But they are not straightforward for a technology platform to make.
Interpreting web event data is hard because it is a function of all the above. To analyse web data, you need to appreciate all three factors. Unfortunately, you cannot control for all of them. A/B testing means you can control for site design, to a limited extent, and comparing user actions between different sets of users enables you to control (a little bit) for user intention. Exercising that control, though, is very difficult. For the most part, web analysts are like astrophysicists, able to capture data, but limited in the experiments they can run to unpick the impact of different factors on that data.
Once again, an intelligent analyst is best placed to unpick the impact of the three factors identified above - it is a pretty impossible task for a web analytics platform to perform, because the platform lacks that contextual knowledge.
The key, then, to handling complexity related to the amount of domain knowledge that is required to generate meaning out of web data, is to give the analyst the freedom to address the above questions using all the domain expertise at her disposal, and trust that she uses that domain knowledge to:
These challenges are much better met by a person with that contextual knowledge, than a web analytics program that lacks it. The web analytics program really needs to get out of the way of the analyst, so she can address them directly.
Web event data can be used to help answer a whole host of business questions. Some important questions include:
These questions are hard to answer because:
Once again, the key thing to handling this complexity is to give the analyst the tools and the space to develop and experiment with different approaches. There is no general purpose tool that will be able to solve for all of the above, although there may be the possibility of specific tools to answer specific questions.
The above sources of complexity make it clear why web analytics is hard. They present challenges for both web analysts, and web analytics tools.
One approach to dealing with that complexity is to “disguise it”. The web analytics tools hides the underlying complexity behind a UI that presents specific cuts of the data. Many of the contributors to the Google+ thread argued that this was how GA manages to be simpler than SiteCatalyst. Certainly, you can hide all the complexity behind a simple dashboard. But then, you can’t use a dashboard to answer any of the above questions. In this case, what you gain in simplicity, you lose in power and transparency.
Another approach, which is the one we have taken at Snowplow, is to expose the underlying data to the user in a format (data model) that is as easy as possible to understand, and in a data store that is easy to connect multiple different analytics tools. This doesn’t disguise any of the complexity: instead, it exposes it all to the analyst. For many analysts, that is a terrifying prospect. But for some, it is truly liberating: the analyst can now use the analytic and technical approach she prefers to develop answers and insights, unconstrained by any assumed logic in the web analytics tool.
A third approach, taken by Omniture with Sitecatalyst, manages to exacerbate the complexity because of two poor decisions made around Sitecatalyst’s technical architecture:
To implement Sitecat, you have to translate the events that occur on your website, and the entities a user navigating on your website engages with, into the arcane world of Traffic Variables, Success Events, Conversion Variables and Saint Classifications. Your data model is, in many cases, flatted to fit a set of pre-defined fields in Omniture. Contrast that with the much simpler, event-centric approach taken by just about everyone that’s developed a platform in the last five years, including Mixpanel, Kissmetrics, KeenIO, Google Analytics and of course Snowplow.
How you capture a data point in Sitecatalyst determines which reports that data point is used in, and how that data point is used subsequently. That is why, at a simplistic level, you absolutely need to understand Traffic Variables, Success Events, Conversion Variables and Saint Classifications, and how Sitecatalyst treats each of them, in order to do a Sitecatalyst implementation properly. That makes implementations significantly more complicated than the need to be, and they make the impact of “bad” implementations much more catastrophic than they need to be.
In contrast, with Snowplow, no restrictions are placed on how you use any data based on how you choose to capture it. That is because data analysis is completely decoupled from data capture: we only enable you to capture and warehouse your data. You then do whatever you want with it, often in a different tool.
Given the two massive disadvantages to the tight data coupling, it seems only fair to ask if there are any benefits associated with it. There is one that is worth exploring: when you collect your data properly in Sitecatalyst, Sitecat then ensures that that data point accommodated in every report it features. By taking more effort earlier on (at implementation time) to get your data to fit into Sitecat’s rigid data model, you can then breathe easy down the line that anyone using the data via the UI is restricted so that they only use the data properly: they don’t, for example, mix dimensions and metrics with different scope.
We think this “advantage” is not really worth anything. We think it is much easier to work out what dimensions and metrics you should, and should not, plot against one another when you have the data in front of you, but that it is much harder when the data is just an idea at implementation time. Worse, if you cut your data in a way that doesn’t make sense down the line, it is an easy mistake to spot and fix. In contrast, if you stuff up a Sitecat implementation, it can be hard to fix, and costly, and you might have lost months of data in the meantime.
To Omniture’s credit, the two technical decisions made above were committed in the late 1990s, when the web looked very different, and so they were not such bad decisions. Since then, Omniture has had to accommodate growing complexity in the web by making incremental approaches to their platform, rather than reinventing the core platform with a fresh perspective, the way we’ve been able to with Snowplow. But that provides little comfort for the company that has to reimplement Sitecatalyst because they got the implementation wrong the first time.
Yes! Web analytics is hard. But tools like Sitecatalyst make it harder than it needs to be, especially at implementation time. The idea that implementing Sitecatalyst is more difficult than Google Analytics or Snowplow because Sitecatalyst is more powerful than GA is only partly true at best. It is more difficult because reporting and data capture are too tightly coupled, and the data model is totally unnatural to the uninitiated.