Developing a single customer view with Snowplow


Developing a single customer view and effectively identifying users has become a hot topic in the analytics community for two main reasons:

These challenges require companies to rethink how they collect and use data, and how they want to establish a mutually beneficial relationship with their users.

First-party versus third-party data collection

A number of browsers (most prominently Safari, but also Firefox and Edge) are restricting the use of cookies with the aim of protecting users’ privacy. Their measures target third-party data collection, as is done by ad tech companies as well as multi-tenanted analytics solutions such as Google Analytics. Specifically, these measures aim to stop companies from tracking data about users they have no direct relationship with. The strictest of these measures (at the time of writing), Apple’s ITP 2.3 (announced in September 2019), limits first-party client-set cookies (and the equivalent use of local storage) to a single day. In response, many companies tracking user behaviour for internal purposes have switched to first-party data collection.



Find out how to build a complete map of your customer journeys


User stitching

The key steps in developing a single customer view are:

This process is often referred to as user stitching. For example, user stitching can involve connecting events from users before and after they log in (or identify in some other way), or mapping together the same user across different devices.

The remainder of this blog post focuses on how to get started with user stitching if you use Snowplow to collect behavioural data from your digital products.

Capturing user identifiers

Out of the box identifiers on web

The Snowplow JavaScript tracker sets two cookies by default, containing user identifiers and a session identifier. These UUIDs are tracked with all web events.

If tracking is deployed to multiple root domains, a function called crossDomainLinker can be used to pass the domain_userid on one domain through a link click across to another domain, where it will be captured alongside that domain’s domain_userid, in the refr_domainuserid field. Aside from the network_userid this can provide an additional method for linking together domain_userids across domains. More information on this link decoration method can be found in the Snowplow docs.

A note on ITP and other browser privacy measures

If you are tracking only one domain, and you set your Snowplow collector to have a CNAME that is a subdomain of that domain, the network_userid cookie becomes a first-party, server-side cookie, entirely unaffected by the browser measures discussed above. If you are tracking multiple domains, Snowplow supports setting up multiple CNAMEs for your collector such that each domain can have its own first-party, server-side cookie set against its collector domain. However, this approach means that you lose the ability to stitch users across domains using the network_userid, as this ID is now no longer a third-party ID shared across all the domains you are tracking.

Either way, if a significant portion of your users are affected by ITP and other browser privacy measures, you might want to move to using the network_userid as your primary user identifier, rather than the domain_userid. If your collector CNAME is already a subdomain of (one of) your domain(s), you can check whether the network_userid is working as expected by checking the number network_userids where there is more than one domain_userid per network_userid (for a given root domain as captured in the page_urlhost). Please note that if you switch to the network_userid as your primary user identifier, you will also want to re-index your sessions based on that ID (i.e. correct the domain_sessionidx).

Out of the box identifiers on mobile

Snowplow’s iOS and Android SDKs can send a context containing a few mobile-specific user identifiers with all mobile events (called mobile_context). Additionally, the session_context allows you to configure client side sessionization and then capture it with all mobile events.

Setting a custom user ID (all platforms)

All Snowplow trackers allow you to set a custom user ID to be sent with all subsequent events. This will be stored in the user_id field in the main events table.

In certain cases there might be more than just a single custom user ID available for identifying a user, such as a name, email, or additional IDs from different systems. Custom contexts can be defined and sent with all events to capture these, and are also supported on all platforms.

In some cases, events themselves are centered around user identifiers, for example when ingesting email activity from an emailing service (like Mailchimp or Sendgrid) via webhook, or collecting ad impressions via the pixel tracker. In those cases, custom (self-describing) events can be defined to capture this information.

Enriching data with further identifiers

There are two points in the Snowplow pipeline you can add information to the data collected via the Snowplow trackers and webhooks: during enrichment (in real time) and in the data warehouse (hourly or daily).

Adding user information during enrichment

Snowplow offers two ways to add additional information to the data in real-time:

We recommend adding information in real time when it is either likely to change frequently and therefore needs to be added the moment the event occurs, or the data will be used in real time. For all other use cases, we recommend adding additional information to events in the data warehouse (as described in the next section).

Adding user information in the data warehouse

Once your event data is in your data warehouse, you can join it with data from other internal and external systems, such as your CRM, ESP or transactional database. For this purpose, it may be useful to capture at least one internal user identifier with your Snowplow events.

Developing a hierarchy of identifiers

The hierarchy of user identifiers will reflect their availability (how many are captured across the different instrumented platforms) and reliability (internal IDs versus cookies or mobile device identifiers).

Let’s consider the following example: a company runs a website ( and a mobile app. It’s collector CNAME is Users can perform some actions on the website without identifying, such as reading articles, but need to register / login to perform others, such as watching videos. They can also sign up for a newsletter to be notified when new content is available. The mobile app is for registered / logged in users only, i.e. users are prompted to login on the homescreen of the app when they open it. This company can therefore capture the following user identifiers with events:

Based on these identifiers, it could define the following hierarchy:


Building the mapping table

Once the hierarchy of user identifiers has been established, a model needs to be developed to create and update the mapping table based on the events collected. This is most commonly done in the data warehouse (using SQL), but can also be achieved in real time (for example using Spark and a database optimized for fast reading and writing). Continuing with the example from the previous section, and assuming this table is built in the data warehouse where the entire history of events is available, one might develop the following logic:

As soon as a user identifies on web, we want to map their (internal) user_id to their network_userid (our reliable first party server side cookie). Same goes for mobile: we want to match their apple_idfv and all known apple_idfa (or android_idfa) with their user_id. Additionally, we might want to capture all associated domain_userids, and their email in case they ever signed up for an email newsletter. If a user has identified across web and mobile, we can also map all of these identifiers together based on the user_id.

In our mapping table, we don’t really need to capture users that have not yet identified themselves. Stitching together user activity from before and after they register, and across platforms, happens when the mapping table is integrated into the data models.

Applying user stitching to your data models

While the user mapping table constitutes the source of truth for user identification, only when combining it with the continu
ous stream of information about user behaviour does it lead to a single customer view. Let’s consider the following series of events, based on the setup described in section 2.1:

Let’s now suppose the marketing team would like to attribute any new registrations to previous marketing touches. This obviously cannot be done based on the user_id, as this ID is only assigned right at the end of the journey under consideration. It also cannot be done by using the network_userid alone, as the email marketing campaign would not be included. However, as each of the events described above have at least one user identifier available, and each identifier overlaps with at least one other identifier, the user mapping table will contain the following mapping:

user_id <> network_userid <> email 

Based on this, both the email marketing campaign and the Facebook ad can be linked to the user registration.

Start building your single customer view

As the example above shows, often data collected from digital products only becomes insightful or actionable when joined with other data sources, and when user identifiers across platforms and channels are stitched together. Therefore, developing a single customer view is crucial in effectively utilizing your data asset.

If you are interested in learning more about how you can develop a unified view of your customers, get in touch with us today. As a Snowplow BDP customer, you can reach out directly to your Customer Success Manager.

Find out how to develop a unified customer view with Snowplow.


Related articles