How to leverage cookieless and anonymous tracking with Snowplow

Share

Learn how to configure Snowplow to create a strategy for optimal, compliant and ethical tracking

The data collection landscape is constantly evolving, from privacy focused browser updates to increased user awareness to new or updated regulations. At Snowplow we support these changes and wants to help you collect behavioural data in a privacy friendly and ethical manner. Therefore, it is important to have the flexibility in your tools to collect data in a way that suits your use cases and allows you to be respectful of user privacy. With the latest release of the Snowplow JavaScript Tracker and Snowplow Stream Collector, your options have further expanded. This has opened new options for how to design your tracking strategy. 

This post explores 6 different approaches for ethically tracking users on the web with your Snowplow pipeline, ranging from tracking more to tracking less:

  1. Track users anonymously until they consent, and then track them with server side and client side identifiers
  2. Tracking users with only server-side cookies and IP addresses, when they opt in, along with client side sessionization in cookies or local storage.
  3. Tracking users with only server-side cookies and IP addresses, when they opt in, which reduces the client side cookie footprint
  4. Tracking sessions only, with no server-side cookies or IP address capture, but with client side sessionization in cookies or local storage.
  5. Tracking users with no cookies but capturing IP addresses, when they opt in.
  6. Track users anonymously all the time, with a cookieless, first party, tracking approach. This uses no browser storage, does not capture IP addresses and requires no cookie banner, although you may still wish to request for consent to track.

We will look at how to configure your trackers and collector, allowing you to build a strategy around those options. We will explain what they mean for the data that lands in your data warehouse and what your end users’ experience could be like. We’ll also explore server side anonymization/pseudonymization, allowing for analytical queries whilst preserving privacy.

Snowplow user identifiers: a recap

When using the Snowplow JavaScript Tracker alongside the Snowplow Stream Collector, the most popular Snowplow configuration, you will collect 2 user identifiers automatically along with session information, user IP address, and any custom user identifiers you have specified.

In the canonical Snowplow event model, we refer to the two user identifiers as domain_userid and network_userid, the IP address as user_ipaddress and the custom user identifier is simply user_id

The domain_userid is a client side identifier, set by the Snowplow JavaScript Tracker in a cookie (or local storage) within the user’s browser. This identifier has come under fire in recent years from browser vendors, with changes such as ITP limiting it’s lifetime to 7 days. However, it is still a valuable identifier for many who are tracking across multiple top level domains with a single Snowplow collector. This same cookie is also used to store the client side session data which is sent to help you understand the user’s data over each of their sessions on your website.

The network_userid is a server-side identifier, set by the Snowplow Stream Collector within a cookie within the users browser. This identifier is transmitted with every request the Snowplow JavaScript Tracker makes to the Snowplow Stream Collector. This is typically viewed as a very reliable user identifier, however it has also come under pressure from ITP when used in a third party context (when the Snowplow collector is on a different top level domain to the website). 

With the right configuration of your Snowplow JavaScript Tracker and Snowplow Stream Collector you can avoid many of the browser restrictions and create reliable user identifiers alongside valuable session information generated by the Snowplow JavaScript Tracker. You can read more about how to configure your Snowplow pipeline to collect complete and accurate data here.

Cookieless Tracking

Toggleable cookieless and anonymous tracking

Introduced with the Snowplow JavaScript Tracker 2.17.0 and the Snowplow Stream Collector 2.1.0, it is now possible to toggle both the client-side and server-side cookies as you wish. One example might be to track a page view when a user lands on the page but with no user identifiers as they haven’t accepted the cookie consent banner yet, then once they have accepted you can start to send events with user identifiers and session information. 

This mode allows you to toggle between completely cookieless data collection, with the option to switch on cookies when the user consents. We call this feature anonymous tracking. Let’s take a look at how to achieve this with the Snowplow JavaScript Tracker.

When you create your tracker, initialize it in anonymous tracking mode with server anonymization. You should also use eventMethod of post rather than beacon as we store a session cookie to help with reliability of events sent with beacon on Safari.

snowplow("newTracker", "sp", "{{collector_url_here}}", {
  appId: "my-app-id",
  eventMethod: “post”,
  anonymousTracking: { withServerAnonymisation: true },
  stateStorageStrategy: "none",
  contexts: {
    webPage: true
  }
});
Note: Enabling Server Anonymization requires the Snowplow Stream Collector v2.1.0+. Using a lower version of the Snowplow Stream Collector will cause events to fail to send until Server anonymization is disabled.

With the above configuration the tracker will store no cookies in the users browsers, and in your data warehouse you will have a null domain_* fields, a random network_userid and a user_ipaddress of unknown. However, any events you do track will still be captured, so you’ll understand how many hits your site receives (and how many hits don’t accept your cookie banner), what marketing campaigns are driving traffic, which buttons are clicked for a given page, and much more – all without impacting on user privacy. Perfect for many use cases where you don’t need user or session identifiers.

If you display a cookie consent banner to your users, you can then toggle the anonymous tracking features to disable them. You will then start to track with user identifiers, ip addresses and session information. To do that, when your consent banner is accepted, you simply have to call:

snowplow('disableAnonymousTracking', 'cookieAndLocalStorage');

And if you ever want to activate anonymous tracking again, maybe when a user revokes consent, you simply call the inverse:

snowplow('enableAnonymousTracking', { withServerAnonymisation: true });
snowplow('clearUserData'); // Optionally clear client side cookies

And there you have it, a cookieless solution which gives you control over how you wish to identify users based on their preferences. Until this release, it’s been hard to understand how many visits a web site receives before cookie banners are accepted, without investing in server side tracking or looking at web site logs. That’s no longer the case as you can now have a third party cookieless solution with Snowplow.

Disable cookies but capture IP addresses

Using the above option, it is possible to disable cookies all of the time. By simply enabling anonymous tracking mode with server anonymization and never disabling it, you will never create any cookies or capture any cookies or ip addresses with the collector. However, that isn’t the only way to achieve this and removes any way of identifying users in the data warehouse.

You can disable server side cookies altogether in your Snowplow collector configuration without the Snowplow JavaScript Tracking being in anonymous mode. This prevents the collector from generating cookies but does allow the collector to still collect IP addresses and any other cookies which may be automatically sent to the collector by the browser (you can then extract them with the cookie extractor enrichment).

To disable collector cookies, Snowplow Insights customers can simply contact Snowplow Support to disable the server cookies. Open source users, you will need to set the following parameter in your collector HOCON:

cookie {
  enabled = false
}

Using this option allows for a cookieless experience for users. However by capturing the IP address at the collector, this allows you to perform some user analysis – particularly when coupled with user agent information which is also captured. To ensure no cookies are stored client side, you will also need to ensure the Snowplow JavaScript Tracker is running in anonymous mode, but without server anonymization enabled:

snowplow("newTracker", "sp", "{{collector_url_here}}", {
  appId: "my-app-id",
  eventMethod: “post”,
  anonymousTracking: true,
  stateStorageStrategy: "none",
  contexts: {
    webPage: true
  }
});

This technique does capture PII (personally identifiable information) though, as an IP address is generally considered PII and other cookies could also potentially contain PII. To deal with this, Snowplow has capabilities to anonymise this data as a data enrichment step which we describe later in this article.

Server side only user identifiers and cookies

One new option that client side anonymization opens up is allowing you to reduce the cookie footprint of your Snowplow pipeline. By running the Snowplow JavaScript Tracker in client side anonymous mode, but allowing the Snowplow collector to still set the cookies, will mean you have less cookies being stored in the users browser. This means less cookies to mention and describe in cookie policies and fewer cookies for your users to worry about.

This technique works best when the collector cookie is configured to be a reliable first party cookie, being sent from a collector running on a subdomain of the main website. To ensure your collectors cookies are configured to work in this way, you should follow our recent post on how to configure your Snowplow pipeline to collect complete and accurate data here.

With a reliable first party network_userid, your reliance on the `domain_userid` decreases/disappears altogether. Therefore disabling this user identifier is a good option. You can do so when initialising the tracker:

snowplow("newTracker", "sp", "{{collector_url_here}}", {
  appId: "my-app-id",
  eventMethod: “post”,
  anonymousTracking: true,
  stateStorageStrategy: "cookieAndLocalStorage",
  contexts: {
    webPage: true
  }
});

However, since you will be storing cookies on the users browser you may want to still perform client side sessionisation using cookies. To do this, configure your tracker as so:

snowplow("newTracker", "sp", "{{collector_url_here}}", {
  appId: "my-app-id",
  eventMethod: “post”,
  anonymousTracking: { withSessionTracking: true },
  stateStorageStrategy: "cookieAndLocalStorage",
  contexts: {
    webPage: true
  }
});

This now means when describing the Snowplow cookie in a cookie policy, the client side cookie will only be used for sessionization and will not create another user identifier. You can read the descriptions of our cookies here.

You may wish to remove all user identifiers but contain tracking sessions using the client side session cookies (or local storage). This means that there are no user level identifiers being tracked (which preserves privacy) but still allows for session level aggregation. To do this, configure your tracking with both anonymous tracking options:

snowplow("newTracker", "sp", "{{collector_url_here}}", {
  appId: "my-app-id",
  eventMethod: “post”,
  anonymousTracking: { withSessionTracking: true, withServerAnonymisation: true },
  stateStorageStrategy: "localStorage",
  contexts: {
    webPage: true
  }
});

Anonymization

Your Snowplow pipeline contains two techniques for anonymising information. The IP anonymization and PII Pseudonymization enrichments.

IP anonymization

The IP anonymization enrichment will anonymize the IP addresses found in the user_ipaddress field by replacing a certain number of octets or segments with “x”s. The amount of octets or segments which are removed is configurable. For example, anonymizing two octets would change an IPv4 address of 255.255.255.255 to 255.255.x.x.

IP anonymization can handle both IPv4 and IPv6, you can configure anonOctets for IPv4 and anonSegments for IPv6 addresses. You can find out how to configure the IP anonymization enrichment here.

PII Pseudonymization

In the case where you are capturing PII with the users consent, you can further protect the users privacy and ensure the data points are not stored in their original forms with the PII Pseudonymization enrichment. This also aids in compliance for any regulatory concerns. After pseudonymization of the data, you are still able to perform analysis but without being able to reverse the data back to its original value. 

This is achieved as each value that arrives into the Snowplow pipeline that is selected for pseudonymization will be hashed to the same value every time. This means you can still join data on the hashed values, or count distinct values.

The PII Pseudonymization enrichment is configured by choosing which fields to hash along with other configuration settings related to the hashing itself. There are a number of hashing algorithms available: MD2, MD5, SHA-1, SHA-256, SHA-384 and SHA-512. SHA-256 is a common choice as it offers a good balance between performance and security. You can also add a salt when hashing the values. The full set of configuration options and examples can be found here.

Summary

Below is a table of the different strategies outlined at the beginning of this post. Each strategy here assumes you can track anonymously before consent, if desired, but then apply one of the 6 strategies post consent (except for strategy 6 which may not require consent).

#StrategyCookiesUser IdentifiersConsent BannerUseful PII Enrichments
1Track all user identifiers with client side and server side cookies and IP addresses Yes, 3network_userid, domain_userid, domain_sessionid, user_ipaddressYesPII Pseudonymization, IP Anonymization
2Track with server-side cookies and client side sessionization cookieYes, 3network_userid, domain_sessionid, user_ipaddressYesPII Pseudonymization, IP Anonymization
3Track server-side cookies and IP addresses onlyYes, 1network_userid, user_ipaddressYesPII Pseudonymization, IP Anonymization
4Track session identifiers only Yes, 2domain_sessionidYesPII Pseudonymization
5Track with no cookies but capture IP addressNouser_ipaddressYesIP Anonymization
6Track anonymously all the timeNoOptional

Designing an optimal, compliant and ethical tracking solution

With the various components of the Snowplow pipeline described in this post, you are empowered to build a data collection system that is capable of collecting a rich data set whilst being compliant with regulatory requirements and allowing you to make ethical choices when it comes to collecting user data.

There are many options to select from when considering how to design your data collection, whether to collect anonymous data is an important part of that. It works well for some use cases and having these options available gives users of Snowplow the flexibility to decide how to best implement their use cases. 

For Snowplow Insights customers, if you want to talk more about these new options plus the existing ones, please reach out to your customer success manager who will happily guide you through the options best suited to your website or applications. 

If you’d like to know more about anonymous tracking, you can find more information within our Snowplow JavaScript Trackers documentation here.

Share

Related articles