Building robust data pipelines in Scala – Session at Scala eXchange, December 2014


It was great to have the opportunity to speak at Scala eXchange last week in London on the topic of “Building robust data pipelines in Scala: the Snowplow experience”.

It was my first time speaking at a conference dedicated to Scala – and it was fantastic to see such widespread adoption of Scala in the UK and Europe. It was also great meeting up with Snowplow users and contributors face-to-face for the first time!

Many thanks to the team at Skills Matter for organizing such a great conference.

Below the fold I will briefly cover:

  1. Building robust data pipelines in Scala
  2. My highlights from Scala eXchange

Building robust data pipelines in Scala

This session was an opportunity for me to “step back” a little and think about how and why we use Scala to enforce robust event processing at Snowplow. We have always been strong proponents of what we have called “high-fidelity analytics” – in this talk I explored how we use Scalding, the Scalaz toolkit and some simple design patterns to deliver this robustness:

Scala eXchange: Building robust data pipelines in Scala from Alexander Dean

It was a very experienced and technical audience, who asked some great questions. The pattern I presented which seemed to resonate most was “railway-oriented programming”, a term coined in the Railway oriented programming blog post by functional programmer Scott Wlaschin.

At Snowplow we came to Scott’s “railway-oriented” approach independently via Scalaz’s Validation type, which today underpins all of our event validation and processing. Scala and big data guru Dean Wampler was in the audience and summed up the railway approach in a single tweet:

I really enjoyed giving the talk – it was a great opportunity to shine a techincal light on the foundational work we do at Snowplow on event quality and pipeline robustness. You can see a video version of the session online on the Skills Matter website. Expect a chapter on “railyway-oriented programming” in Unified Log Processing in due course!

My highlights from Scala eXchange

The Skills Matter team succeeded in packing a huge number of great sessions into Scala eXchange’s two days. Here were some of my highlights:

Of course these were just my highlights – the two days were packed with great content and interactions across the four tracks. In particular, I was sorry to miss Dean Wampler’s second day keynote on why Scala is dominating the big data landscape – something we definitely concur with at Snowplow.

Many thanks to Skills Matter and all the organizers for an excellent conference!


Related articles