We are pleased to announce the release of Snowplow release 74 European Honey Buzzard. This release adds a Weather Enrichment to the Hadoop pipeline - making Snowplow the first event analytics platform with built-in weather analytics!
The rest of this post will cover the following topics:
- Introducing the weather enrichment
- Configuring the weather enrichment
- Getting help
- Upcoming releases
1. Introducing the weather enrichment
There is a strong body of research to suggest that the weather is a major influence on the behavior of your end-users, for an example see the paper The Effect of Weather on consumer Spending (Murray, Di Muro, Finn, Leszczyc, 2010). To be able to perform these kinds of analyses, it’s critical to be able to attach the correct weather to each event prior to storing and analyzing those events in Redshift, Spark or similar.
- Runs inside our Snowplow Enrichment process
- Looks up the weather for each event from OpenWeatherMap.org
- Caches the weather for this time and place to minimize the number of requests to OpenWeatherMap.org
- Adds the weather represented by org.openweathermap/weather/jsonschema/1-0-0 to the event’s
Note that this release only adds this enrichment for the Snowplow Hadoop pipeline; we will be adding this to the Kinesis pipeline in the next release of that pipeline.
2. Configuring the weather enrichment
To use the new Weather Enrichment functionality you need to:
- Obtain an OpenWeatherMap.org API key to perform historical requests. Note that you will need to subscribe to a paid plan for historical data
- Enable the MaxMind IP lookups enrichment so that each event has the user’s geo-location attached
- Configure the weather enrichment with your API key, preferred geo-precision and other parameters
The example configuration JSON for this enrichment is as follows:
To go through each of these settings in turn:
apiKeyis your key you need to obtain from OpenWeatherMap.org
cacheSizeis the number of requests the underlying Scala Weather client should store. The number of requests for your plan, plus 1% for errors, should work well
timeoutis the time in seconds after which request should be considered failed. Notice that failed weather enrichment will cause your whole enriched event to end up in the bad bucket
apiHostis set to one of several available API hosts - for most cases
history.openweathermap.orgshould be fine
geoPrecisionis the fraction of one to which geo coordinates will be rounded for storing in the cache. Setting this to 1 gives you ~60km inaccuracy (worst case), the most precise value of 10 gives you ~6km inaccuracy (worst case)
To take advantage of this new enrichment, update the “hadoop_enrich” jar version in the “emr” section of your configuration YAML:
Make sure to add a
weather_enrichment_config.json configured as above into your
enrichments folder too.
Finally, if you are using Snowplow with Amazon Redshift, you will need to deploy the following table into your database:
4. Getting help
For more details on this release, please check out the R74 European Honey Buzzard release notes on GitHub. Specific documentation on the new enrichment is available here:
- The Weather enrichment page
5. Upcoming releases
By popular demand, we are adding a section to our release blog posts to trail upcoming Snowplow releases. Note that these releases are always subject to change between now and the actual release date.
Upcoming releases are:
- Release 75 Long-Legged Buzzard, which adds support for ingesting events from SendGrid and Urban Airship into Snowplow
- Release 76 Bird TBC, which will refresh our EmrEtlRunner app, including updating Snowplow to using the EMR 4.x AMI series
- Release 77 Bird TBC, which will bring the Kinesis pipeline up-to-date with the most recent Scala Common Enrich releases. This will also include click redirect support in the Scala Stream Collector