Unified Log London 3 with Apache Kafka and Samza at State

28 May 2015  •  Alex Dean

Last week we held the third Unified Log London meetup here in London. Huge thanks to Just Eat for hosting us in their offices and keeping us all fed with pizza and beer!

unified-log-london-meetup

More on the event after the jump:

There were two talks at the meetup:

  • I gave a recap on the Unified Log “manifesto” for new ULPers, with my regular presentation on “Why your company needs a Unified Log”
  • Mischa Tuffield, CTO at State, gave an excellent talk on implementing a Unified Log at State to meet various operational and analytical data requirements, all using Apache Kafka and Samza

The meetup had a great mix of Unified Log practitioners and people just starting to explore the concept. It was particularly encouraging to see such an interactive, “salon” style atmosphere to the discussion, continuing late into the evening!

1. Why your company needs a Unified Log

In this talk, I summarized the emergence of the Unified Log concept, talking through the “three eras” of data processing and explaining why it makes sense to restructure your company around a Unified Log. Regular readers of this blog may well have seen a version of this presentation already, included here for completeness:

2. Unified Log at State

We were lucky enough to have Mischa Tuffield and Dan Harvey, Data Architect at State, talk us through their implementation of the Unified Log concept at State. Learning about the real-world experience of implementing ULP is a key part of Unified Log London, so it was great to hear Mischa and Dan’s story. Mischa’s slides are here:

Key building blocks of State’s Unified Log implementation are:

Given our focus at Snowplow on the various analytical uses of the Unified Log, it was really helpful for me to get Mischa and Dan’s more operational/transactional-focused perspective on the Unified Log.

3. Big themes

There were some really interesting themes that emerged during the talks and the subsequent discussion. To highlight just three:

  • Stream design - specifically, whether to create individual streams (topics in Kafka parlance) for each entity, or whether to have every-entity streams which are tied only to the processing stage. State follow the first approach, Snowplow the second
  • Eventsourcing versus entity snapshotting - this really warrants a full blog post, but there was some healthy debate about whether an individual event should capture complete entity snapshots or just deltas (i.e. just the properties that have changed). There was a general feeling (which we share at Snowplow) that entity snapshots are much safer in the face of potentially lossy systems
  • The importance of a schema registry - in the Unified Log model, your events’ schemas form the sole contract between your various stream processing applications, and so having a single source of truth for these schemas - a registry/repository - becomes essential

4. Thanks and next event

It was a great meetup - in particular it’s exciting to see the Unified Log patterns becoming such a hot discussion topic. A big thank you to Raj Singh, Peter Mounce and the Just Eat Engineering team for being such excellent hosts, and a warm thanks to Mischa and Dan for giving us the inside track on Unified Log at State!

Do please join the group to be kept up-to-date with upcoming meetups, and if you would like to give a talk, please email us on unified-meetup@snowplowanalytics.com.