Looking back on 2015: Most read blogposts

24 December 2015  •  Christophe Bogaert

2015 is drawing to a close, so we decided to crunch our own numbers in Redshift and share which blogposts were read the most. The Snowplow team published 82 new posts in 2015 and more than 2953 hours were spent reading content on our blog (a metric which we calculated using page pings). Apache Spark and AWS Lambda were the topics that resonated most with our readers. We will continue to write about both topics, and many others, in 2016.

The 10 most read posts in 2015:

  1. First experiments with Apache Spark at Snowplow
  2. Apache Spark Streaming example project released
  3. AWS Lambda Node.js example project released
  4. AWS Lambda Scala example project released
  5. Modeling events through entity snapshotting
  6. Snowplow 64 Palila released with support for data models
  7. Spark Example Project 0.3.0 released for getting started with Apache Spark on EMR
  8. Schema Guru 0.1.0 released for deriving JSON Schemas from JSONs
  9. Samza Scala example project released
  10. JSON schemas for Redshift datatypes

Some older posts remained popular as well. These were the most popular ones from the archives:

  1. Writing Hive UDFs - a tutorial
  2. Dealing with Hadoop’s small files problem
  3. Spark Example Project released for running Spark jobs on EMR
  4. Amazon Kinesis tutorial - a getting started guide
  5. Introducing self-describing JSONs

We hope to publish even more great content in 2016! Make sure to follow us on Twitter to stay up to date.