Kinesis S3 0.5.0 released

07 July 2017  •  Ben Fradet

We are proud to be releasing version 0.5.0 of Kinesis S3, our project dedicated to sinking Kinesis streams, including Snowplow raw and enriched event streams, to S3. This release revolves around community-driven improvements as well as the modernization of the project.

This post will cover:

  1. Fix silent suppresion of failures
  2. Community contributions
  3. Project modernization
  4. Roadmap
  5. Contributing

1. Fix silent suppression of failures

We’ve uncovered a situation where failures prior to the serialization of the records stored in Kinesis were silently dismissed. This release introduces a fix for this, so we strongly recommend everyone using Kinesis S3 to upgrade to 0.5.0.

The details of the issue and fix can be found in issue 101.

2. Community contributions

This release has been largely driven by some fantastic community contributions, which we will detail here.

2.1 Newline at the end of gzipped files

Kacper Bielecki from LiveIntent discovered that the last record of gzipped files was dismissed as a result of a missing empty newline and contributed a fix. Thanks a lot Kacper!

2.2 Support for Kinesis in the China region

The Kinesis endpoint in China (kinesis.cn-north-1.amazonaws.com.cn) doesn’t conform to the usual endpoint format (kinesis..amazonaws.com). As a result, it was not supported by Kinesis S3.

This has now been fixed by Bob Xiao - thanks Bob!

2.3 Resolving environment variables in the configuration

From now on, you will be able to include environment variables in your configuration file, like so:

sink {
  aws {
    access-key: "${AWS_ACCESS_KEY_ID}"
    secret-key: "${AWS_SECRET_ACCESS_KEY}"
  }
  // ...
}

This feature was contributed by Shin Nien, thanks a lot!

3. Updates

We also took advantage of this release to do a full refresh on Kinesis S3, which translates into:

4. Roadmap

Going forwards, we’re interested in exploring support for additional formats, such as Apache Parquet or Apache Avro.

Additionally, we want to extend the existing support for LZO, providing a way to store LZO files without their indices.

Finally, we’d like to reduce Kinesis S3’s coupling with Kinesis by supporting NSQ. This is part of an ongoing effort to have the Snowplow platform become cloud-agnostic. At that point we will rename this project to the Snowplow S3 Loader.

5. Contributing

You can check out the repository if you’d like to get involved! In particular, if there is an issue in the 0.6.0 milestone that catches your eye, don’t hesitate to add a comment saying that you’ve started working on it.