Identity stitching is the process of identifing all the different events on a particular user’s journey and stitching them together to form a complete record of that journey. Identity stitching is a key step in any customer-centric analysis: if we cannot reliably identify that a set of actions were carried out by a particular user, we cannot accurately:
Unfortunately, identity stitching is hard. Users typically interact with websites from multiple different devices and browsers. They may regularly clear their cookies. And they may share computers (and hence cookie IDs) with other users.
This is one reason why companies often like to get users to identify themselves on their websites, by logging in. In doing so, the user identifies him or herself in an unambiguous way. When a user does so, we have the opportunity to pass that data into the analytics system, and use it to drive the identity stitching process. In Snowplow, this is done using the setUserId method. One of the selling points of Universal Analytics is that it offers its own, comparable way for passing in user IDs. KISSmetrics offers a identify and alias method and Mixpanel offers a distinct_id and alias method, both of which enable the passing in of user IDs in for analytics purposes and predate the Snowplow and Universal Analytics approaches.
Even when users log in to a website, however, identity stitching is not straightforward. To take an obvious example - a user may visit a site multiple times before he / she registers for that service, which is nearly always a prerequisite before he / she can log in. This represents a key part of the user journey that companies will want to analyze (especially when optimizing their customer acquisition spend). However, as Shay Sharon explains, Mixpanel does not correctly attribute event data from sessions that occurred before the user registered - only from earlier events in the the session he / she first logged in. As Yehoshua Cohen explains, Universal Analytics has the same limitation. Only KISSmetrics manages to stitch together data from those earlier sessions to the user ID that was passed in subsequently.
There are other cases, however, where KISSmetrics gets it wrong. To give one example, where two users visit a website from the same computer, but only the second user logs in, KISSmetrics will erroneously count those two users as the same person.
There are two underlying reasons why Universal Analytics, KISSmetrics and Mixpanel’s approach to identity stitching falls short:
Snowplow takes a radically different approach to identity stitching:
To take a simple example, we can implement the approach used by KISSmetrics to perform identity stitching. In this case, we would:
setUserIdmethod to pass in the user ID to Snowplow when a user logs in. This is saved into the Snowplow events table in the
network_useriddepending on whether we’re using 1st or 3rd party cookies) to
network_userid) where the user logged in at any stage of his / her journey, all the events associated with those cookie IDs will be ascribed to the single associated
This approach is described in detail in the Analytics Cookbook.
For many of our users, however, this approach is either too simplistic or inappropriate:
If any of the above are true, a totally different approach to identity stitching is required. Fortunately, businesses running Snowplow are free to develop and employ whatever algorithm they want to perform identity stitching. Snowplow does a number of things to give businesses as much flexibility as possible when developing their own algorithms:
In general at Snowplow, we believe strongly in decoupling event-data collection from applying business logic to the event data collected. The reason is simple - for a rapidly evolving business, business logic will change over time. If the analytics system is going to be able just to keep pace with the business (and really, we believe that the data should be driving that change, rather than playing catch-up), it needs to be possible to evolve the way business logic is implemented in the analytics system over time. By decoupling the collection of event data from the application of business logic, we buy ourselves that flexibility.
It turns out that identity stitching is just one type of business logic that benefits from this approach. Rules for defining sessions, segmenting audience, assigning users to cohorts, categorising and classifying content items and media and even defining KPIs can all change over time. To date, Snowplow is the only web analytics platform that enables that enables companies to evolve the way business logic is defined on their underlying event data, and apply updated definitions to the complete data set over time.
Then get in touch with our team!