The API Request Enrichment lets you effectively join arbitrary entities to your events during the enrichment process, as opposed to attaching the data in your tracker or in your event data warehouse. This is very powerful, not least for the real-time use case where performing a relational database join post-enrichment is impractical.
This is our most configurable enrichment yet: the API lookup can be driven by data extracted from any field found in the Snowplow enriched event, or indeed any JSON property found within the
derived_contexts fields. It lets you extract multiple entities from the API’s JSON response as self-describing JSONs for adding back into the
For a detailed walk-through of the API Request Enrichment, check out our new tutorial, Integrating Clearbit business data into Snowplow using the API Request Enrichment.
You can also find out more on the API Request Enrichment page on the Snowplow wiki.
In R72 Great Spotted Kiwi we released the Cookie Extractor Enrichment, allowing users to capture first-party cookies set by other services on your domain such as ad servers or CMSes. This data was extracted from the HTTP headers stored in the Thrift raw event payload by the Scala Stream Collector.
Depending on your tracking implementation, these HTTP headers can contain other relevant data for analytics - and with this release, community member Khalid Jazaerly has contributed a new powerful HTTP Header Extractor Enrichment to extract these.
The configuration is similar to the one for the Cookie Extractor Enrichment:
This configuration will extract all headers from HTTP requests, including cookies; in practice you would probably extract more specific headers. Each extracted header will be stored as a single derived context with the JSON Schema org.ietf/http_header/jsonschema/1-0-0.
Please note that this enrichment only works with events recorded by the Scala Stream Collector - the CloudFront and Clojure Collectors do not capture HTTP headers.
You can find out more on the HTTP Header Extractor Enrichment page on the Snowplow wiki.
This release also updates the Iglu client used by our Hadoop Enrich and Hadoop Shred components to version 0.4.0. This version lets you fetch your schemas from Iglu registries with authentication support, allowing you to keep your proprietary schemas private.
To use registry authentication, you need to be using the Iglu schema registry server released as part of Iglu R3 Penny Black; the setup guide is on the Iglu wiki. Then in the Iglu resolver configuration JSON you use with Snowplow, you will need to add
apikey to the HTTP repository
connection object, like so:
Note also the change of the schema from
This update also contains two important bug fixes:
We have also:
The recommended AMI version to run Snowplow is now 4.5.0 - update your configuration YAML as follows:
Next, update your
hadoop_shred job versions like so:
For a complete example, see our sample
If you want to use an Iglu registry with authentication, add a private
apikey to the registry’s configuration entry and set the schema version to 1-0-1. Here is an example:
Unfortunately, due to a current limitation in Iglu’s authentication system, you’ll need to add one entry into the
repository array for each set of schemas with a distinct
vendorPrefix within a single registry. We plan on fixing this in an Iglu release soon.
This enrichment is the first in a series of new flexible dimension widening enrichments for Snowplow; we are hard at work on a SQL Query Enrichment, which we will again release with an in-depth tutorial.
In the meantime, upcoming Snowplow releases include:
Note that these releases are always subject to change between now and the actual release date.
For more details on this release, please check out the release notes on GitHub.
If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.