To support these use cases we have added a new URI redirect mode into the Clojure Collector. You update your link’s URI to point to your event collector, and the collector receives the click, logs a URI redirect event and then performs a 302 redirect to the intended URI. This is the exact model followed by ad servers to track ad clicks.
To use this functionality:
/r/tp2tells Snowplow that you are attempting a URI redirect
&u=argument to your collector URI, where `` is your URL-encoded final URI to redirect to
The URI redirection will be recorded using the JSON Schema com.snowplowanalytics.snowplow/uri_redirect/jsonschema/1-0-0.
For more information on how this functionality works, check out the Click tracking section in our Pixel Tracker documentation.
We will be adding this capability into the Scala Stream Collector in Release 74.
One powerful attribute of having Snowplow event collection on your own domain (e.g.
events.snowplowanalytics.com) is the ability to capture first-party cookies set by other services on your domain such as ad servers or CMSes; these cookies are stored as HTTP headers in the Thrift raw event payload by the Scala Stream Collector.
Prior to this release there was no way of accessing these cookies in the Snowplow Enrichment process - until now, with Snowplow community member Kacper Bielecki’s new Cookie Extractor Enrichment. This is our first community-contributed enrichment - a huge milestone and hopefully the first of many! Thanks so much Kacper.
The example configuration JSON for this enrichment is as follows:
This default configuration is capturing the Scala Stream Collector’s own
sp cookie - in practice you would probably extract other more valuable cookies available on your company domain. Each extracted cookie will end up a single derived context following the JSON Schema org.ietf/http_cookie/jsonschema/1-0-0.
For more information see the Cookie extractor enrichment page on the Snowplow wiki.
Please note that this enrichment only works with events recorded by the Scala Stream Collector - the CloudFront and Clojure Collectors do not capture HTTP headers.
This release comes with 3 new SQL scripts that deduplicate events in Redshift using the event fingerprint that was introduced in Snowplow R71. For more information on duplicates, see the recent blogpost that explores the phenomenon in more detail.
The first script deduplicates rows with the same
event_fingerprint. Because these events are identical, the script leaves the earliest one in atomic and moves all others to a separate schema. There is an optional last step that also moves all remaining duplicates (same
event_id but different
event_fingerprint). Note that this could delete legitimate events from atomic.
The second is an optional script that deduplicates rows with the same
event_id where at least one row has no
event_fingerprint (older events). The script is identical to the first script, except that an event fingerprint is generated in SQL.
The third script is a template that can be used to deduplicate unstructured event or custom context tables. Note that contexts can have legitimate duplicates (e.g. 2 or more product contexts that join to the same parent event). If that is the case, make sure that the context is defined in such a way that no 2 identical contexts are ever sent with the same event. The script combines rows when all fields but
root_tstamp are equal. There is an optional last step that moves all remaining duplicates (same
root_id but at least one field other than
root_tstamp is different) from atomic to duplicates. Note that this could delete legitimate events from atomic.
These scripts can be run after each load using SQL Runner. Make sure to run the setup queries first.
This release bumps the Clojure Collector to version 1.1.0.
To upgrade to this release:
You need to update the version of the Enrich jar in your configuration file:
If you wish to use the new cookie extractor enrichment, write a configuration JSON and add it to your enrichments folder. The example JSON can be found here.
Install the following tables in Redshift as required:
For more details on this release, please check out the R72 Great Spotted Kiwi release notes on GitHub. Specific documentation on the two new features is available here:
If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.
By popular request, we are adding a section to these release blog posts to trail upcoming Snowplow releases. Note that these releases are always subject to change between now and the actual release date.
Upcoming releases are:
atomic.eventsand adds the ability to load bad rows into Elasticsearch
Other milestones being actively worked on include Avro support #1, Weather enrichment and Snowplow CLI #2.