Elasticsearch Loader 0.10.0 released

12 September 2017  •  Enes Aldemir

We are thrilled to announce version 0.10.0 of the Snowplow Elasticsearch Loader, our application for writing Snowplow enriched events and more to Elasticsearch.

In this post, we will cover:

  1. NSQ support
  2. Support for writing raw JSONs
  3. Support for “AT_TIMESTAMP” as initial position
  4. Configuration changes
  5. Contributing

1. NSQ Support

With this release, we are adding support for NSQ as an event source: the loader can now sink Snowplow enriched events from an NSQ topic to Elasticsearch.

NSQ is a real-time distributed messaging platform. For more information on getting started, read the NSQ installation guide. We are planning on migrating Snowplow Mini to use NSQ under the hood, and so this new functionality is a stepping stone to this goal.

Assuming you have NSQ set up already, you will need to make some changes to the Elasticsearch Loader’s configuration file:

  • Change the “source” field’s value to “nsq”
  • Complete the NSQ section of the config per your NSQ setup
  • Change the stream names to match your NSQ topic names

2. Support for writing raw JSONs

We’re keen to widen the community using the Snowplow loaders, and in support of this we have added the ability to write non-Snowplow JSON payloads to Elasticsearch.

You can think of this as an open-source version of Amazon Kinesis Firehose, but much more flexible, working as it does with NSQ, Kinesis and stdin sources and non-Elasticsearch Service clusters.

To write non-Snowplow JSONs to Elasticsearch, just change the enabled field to “plain-json” in the config file.

3. Support for "AT_TIMESTAMP" as initial position

Prior to this release, there were two possible configuration options to serve as the initial processing position for the Kinesis stream: TRIM_HORIZON and LATEST.

These determine what happens on the first run of the application, when the Kinesis Connectors Library inside the Elasticsearch Loader creates a DynamoDB table to track what it has consumed from the stream so far. Either on first run it would start consuming from LATEST (the most recent record in the stream), or from TRIM_HORIZON (the oldest record available in the stream).

This release adds a third option for the initial position, AT_TIMESTAMP. With the AT_TIMESTAMP option, consuming will start from the specified timestamp. To use AT_TIMESTAMP as an initial position, you should change the initialPosition and initialTimestamp fields in the configuration. initialPosition should be AT_TIMESTAMP and the initialTimestamp field must be changed to the point in time at which message consumption will begin. This timestamp needs to follow the “yyyy-MM-ddTHH:mm:ssZ” format. You can get more information about initial positions from our own guide to Kinesis at-least-once processing.

4. Configuration changes

There have been some notable changes to the configuration file format expected by the Elasticsearch Loader.

Please check out the example configuration file in the repository to make sure that your configuration file has all the expected parameters.

5. Contributing

You can check out the repository and the open issues if you’d like to get involved!