Snowplow 98 Argentomagus released

05 January 2018  •  Ben Fradet

We are pleased to announce the release of Snowplow R98 Argentomagus. This realtime pipeline release brings some critical security and quality-related improvements, new Scala Stream Collector capabilities plus the introduction of the four webhooks introduced in R97 Knossos’s to the realtime pipeline.

The new features for the Scala Stream Collector were driven by community member Rick Bolkey from OneSpot - huge thanks Rick!

Read on for more information on R98 Argentomagus, named after the ancient Roman city located in central France:

  1. Stream Enrich: better timestamp validation
  2. Scala Stream Collector: configurable Flash cross-domain policy
  3. Other Scala Stream Collector improvements
  4. Upgrading
  5. Roadmap
  6. Help

argentomagus

1. Stream Enrich: better timestamp validation

Prior to this release both the realtime and batch enrichment processes would let nonsensical timestamps, such as 22017-11-28 10:01:36, through. However, those events would fail loading into the database of your choice.

With Argentomagus, our realtime enrichment process will now reject those events, which will be routed to the “bad rows” event stream.

This data quality improvement will make its way to the batch pipeline in the next release.

2. Scala Stream Collector: configurable Flash cross-domain policy

On the security side of things, we have made the cross domain policy of the Scala Stream Collector configurable.

First, what is a Flash cross-domain policy? Quoting the Adobe website:

A cross-domain policy file is an XML document that grants a web client, such as Adobe Flash Player or Adobe Acrobat (though not necessarily limited to these), permission to handle data across domains. When clients request content hosted on a particular source domain and that content make requests directed towards a domain other than its own, the remote domain needs to host a cross-domain policy file that grants access to the source domain, allowing the client to continue the transaction.

To allow a Flash media player hosted on another web server to access content from the Adobe Media Server web server, we require a crossdomain.xml file. A typical use case will be HTTP streaming (VOD or Live) to a Flash Player. The crossdomain.xml file grants a web client the required permission to handle data across multiple domains.

A cross-domain policy file gives the necessary permissions when, for example, you are trying to make a request to a Snowplow collector from a Flash game given that both are running on different hosts.

The Scala Stream Collector embeds what was a very permissive cross-domain policy file, giving permission to any domain and not enforcing HTTPS:

<?xml version="1.0"?>
<cross-domain-policy>
  <allow-access-from domain="*" secure="false" />
</cross-domain-policy>

With Release 98, we’re completely removing the /crossdomain.xml route by default - it will have to be manually re-enabled by adding the following crossDomain section to the configuration:

collector {
  # ...
  crossDomain {
    # Domain that is granted access, *.acme.com will match http://acme.com and http://sub.acme.com
    domain = "*"
    # Whether to only grant access to HTTPS or both HTTPS and HTTP sources
    secure = true
  }
}

3. Other Scala Stream Collector improvements

Rick Bolkey from OneSpot has contributed a whole suite of improvements to the Scala Stream Collector - much appreciated, Rick.

3.1 URL redirect replacement macro

This new feature lets you scan your redirect for a pattern and replaces it with the network_userid. This is a powerful tool for performing cookie matching, aka “cookie sync”, for sharing your Snowplow third-party cookie IDs with an ad platform or similar.

As an example, let’s say you’ve enabled this feature by adding the following to your configuration:

collector {
  # ...
  redirectMacro {
    enabled = true
    placeholder = "[TOKEN]"
  }
}

And you’re making a redirect request to:

http://your-collector-endpoint/r/tp2?u=http%3A%2F%2Fexample.com%3Fnuid%3D[TOKEN]

The redirect will point to:

http://example.com?nuid=123

Where 123 is the network_userid.

3.2 Preserving the HTTP scheme when leveraging cookie bounce

In Snowplow R93 Virunum, we introduced cookie bounce. The limitation of this feature was that, when running Scala Stream Collectors behind a load balancer, redirects would lose the original request’s scheme and http would always be assumed.

Now you can leverage a header specifying the original scheme and use it in your redirect with the following configuration:

collector {
  # ...
  cookieBounce {
    # ...
    forwardedProtocolHeader = "X-Forwarded-Proto"
  }
}

Note that for AWS Classic ELB, the original request’s scheme is contained in the X-Forwarded-Proto header; your load balancer may use a different header.

3.3 Bypassing Akka-HTTP partial URL decoding of redirects

When using redirects, the Scala Stream Collector would leverage the built-in Location header provided by Akka-HTTP, the HTTP server library used by the Scala Stream Collector.

However, if this redirect contained a URL as a query parameter, this URL would be partially decoded and would not be resolvable. This has been fixed in Argentomagus.

4. Upgrading

The real-time applications for R98 Argentomagus are available at the following locations:

http://dl.bintray.com/snowplow/snowplow-generic/snowplow_scala_stream_collector_0.12.0.zip
http://dl.bintray.com/snowplow/snowplow-generic/snowplow_stream_enrich_0.13.0.zip

Docker images for those new artifacts will follow shortly.

5. Roadmap

Upcoming Snowplow releases will include:

6. Getting help

For more details on this release, please check out the release notes on GitHub.

If you have any questions or run into any problems, please visit our Discourse forum.