We are pleased to release version 0.6.0 of Snowplow S3 Loader, formerly known as Kinesis S3, our project dedicated to storing data, including Snowplow raw and enriched event streams, to Amazon S3.
This post will cover:
1. NSQ Support
This release introduces NSQ as an event source – it is for this reason that we have renamed the project from Kinesis S3. Adding NSQ support to the Snowplow S3 Loader is another step towards migrating Snowplow Mini to use NSQ, also enabling Snowplow Mini to back-up its event stream to S3.
Assuming you have NSQ setup already, you will need to make some changes to the S3 Loader’s configuration file in order to sink your NSQ events into an S3 folder:
- Change the “source” and “sink” values to “nsq”
- Complete the NSQ section of the config per your NSQ setup
- Change the stream names to match your NSQ topic names
2. Support for “AT_TIMESTAMP” as initial position
Prior to this release, there were two possible configuration options to serve as the initial processing position for the Kinesis stream:
These determine what happens on the first run of the application, when the Kinesis Client Library inside the Elasticsearch Loader creates a DynamoDB table to track what it has consumed from the stream so far. Either on first run it would start consuming from
LATEST (the most recent record in the stream), or from
TRIM_HORIZON (the oldest record available in the stream).
This release adds a third option for the initial position,
AT_TIMESTAMP. With the
AT_TIMESTAMP option, consuming will start from the specified timestamp. To use
AT_TIMESTAMP as an initial position, you should change the
initialTimestamp fields in the configuration.
initialPosition should be
AT_TIMESTAMP and the
initialTimestamp field must be changed to the point in time at which message consumption will begin. This timestamp needs to follow the “yyyy-MM-ddTHH:mm:ssZ” format. You can get more information about initial positions from our own guide to Kinesis at-least-once processing.
Two important changes have been made to how you run the Snowplow Loader.
3.1 Non-executable JARs
From now on, the produced artifacts will be non-executable JAR files. We found that sbt-assembly, the plugin we use to build fat JARs, was producing executable but unfortunately corrupt JAR files, hence this change.
As a result, you’ll now have to launch the loader like so:
java -jar snowplow-s3-loader-0.6.0.jar --config my.config
3.2 Configuration changes
There have been quite a few changes made to the configuration expected by the S3 Loader. Please check out the example configuration file in the repository to make sure that your configuration file has the expected parameters.