We are pleased to announce the release of Snowplow 0.9.9. This is primarily a comprehensive bug fix release, although it also adds the new
campaign_attribution enrichment to our enrichment registry. Here are the sections after the fold:
- The campaign_attribution enrichment
- Clojure Collector fixes
- StorageLoader fixes
- EmrEtlRunner fixes and enhancements
- Hadoop Enrich fixes and enhancements
- Documentation and help
Snowplow has five fields relating to campaign attribution:
mkt_campaign. In previous versions of Snowplow, the values of these fields were based on the corresponding five
utm_ fields supported by Google for campaign manual tagging.
campaign_attribution enrichment allows you to alter this behavior. For each of the five fields, you can specify an array of querystring fields to check for the appropriate value.
This is the configuration to use if you want to duplicate the functionality of previous Snowplow versions, populating the campaign fields from the standard
utm_ querystring parameters:
The JSON has the same format as the JSONs for the other enrichments: static
vendor fields, an
enabled field which can be used to turn the enrichment off, and a
parameters field containing data specific to the enrichment:
mappingmust, for now, have the value “static”. See the Roadmap section below for an explanation of our plans for this field.
fieldsfield matches each of the five Snowplow mkt_ fields with a list of querystring fields to be populated by.
With the above configuration, if the querystring contained
then the fields would be populated like this:
You can have more than one querystring field in each array:
The first field name found takes precedence. In this example, if there is a “utm_medium” field in the querystring, its value will be used as the ‘mkt_medium’; otherwise, if there is a “medium” field in the querystring, its value will be used; otherwise, the
mkt_medium field will be
We plan on extending the
campaign_attribution enrichment to also extract the advert’s click ID as well, if found (#1073). This will serve as a good basis for more granular campaign analytics.
We have also sketched out a potential option to set the
We have fixed a pair of bugs which caused issues with the IP addresses recorded by the Clojure Collector, especially when running in a VPC with multiple nodes. The tickets are here:
- Fixed regression in log record format caused by #854 (#992)
- Correctly handles multiple IPs in X-Forwarded-For (#970)
Thank you for your patience in the resolution of these issues – we have had the updated version in test with various respondents and everything seems to be functioning correctly now.
There was an issue (#1012) where the StorageLoader was attempting to fetch JSON Path files from the main Snowplow Hosted Assets bucket, which is in
eu-west-1. For users trying to load shredded JSONs into a Redshift instance in another region, the
COPY FROM JSON was failing because any JSON Path files must be in the same region as the target table.
We have fixed this by mirroring all of our hosted assets (including JSON Path files) to per-region buckets (
s3://snowplow-hosted-assets-us-east-1 etc). Then StorageLoader chooses the correct Snowplow Hosted Assets bucket to use, based on the region of the target Redshift database.
We have resolved two issues which should facilitate the smoother running of EmrEtlRunner:
- We fixed a regression with
--process-enrich, thanks to community member Rob Kingston for spotting this (#1089)
- Now if there are no rows to process, EmrEtlRunner correctly returns a 0 status code at the command-line, not a 1 as before (#1018)
To make EmrEtlRunner more robust in scenarios where it is run very frequently (e.g. every hour), we have added in checks that the
:shredded:good folders are empty before starting jobflow steps that would write additional data to them. Please see issue #1124 for more details on this.
0.9.9 fixes a bug in how Snowplow’s Hadoop Enrichment process validates an incoming (i.e. tracker-generated)
event_id UUID. According to the specification, UUIDs with capital letters are valid on read. This release fixes the bug by downcasing all incoming UUIDs.
This release also now supports trackers sending in the original client’s useragent via the
&ua= parameter (new in the Snowplow Tracker Protocol). This is useful for situations where your tracker does not reflect the true source of the event, e.g. with the Ruby Tracker reporting a user’s checkout event in Rails.
You need to update EmrEtlRunner and StorageLoader to the latest code (0.9.2 and 0.3.3 respectively) on GitHub:
This release bumps the Hadoop Enrichment process to version 0.8.0.
In your EmrEtlRunner’s
config.yml file, update your Hadoop enrich job’s version to 0.8.0, like so:
For a complete example, see our sample
If you upgrade Hadoop Enrich to version 0.8.0 as above, you MUST also follow these steps, or else campaign attribution will be disabled.
To use the new enrichment, add a “campaign_attribution.json” file containing a
campaign_attribution enrichment JSON to your enrichments directory. Note that the previously automatic behaviour of populating the
mkt_ fields based on the
utm_ querystring fields no longer occurs by default. To reproduce it you must use the Google-like manual tagging configuration.
This release bumps the Clojure Collector to version 0.8.0.
To upgrade to this release:
- Download the new warfile by right-clicking on this link and selecting “Save As…”
- Log in to your Amazon Elastic Beanstalk console
- Browse to your Clojure Collector’s application
- Click the “Upload New Version” and upload your warfile
Documentation relating to enrichments is available on the wiki: