Elasticsearch Loader 0.9.0 released

We are thrilled to announce version 0.9.0 of Elasticsearch Loader, our component that lets you sink your Kinesis streams of Snowplow enriched events to Elasticsearch.

This release adds support for Elasticsearch 5 and other important features such as the possibility to use SSL when relying on the REST API of Elasticsearch and the ability to sign requests when using Amazon Elasticsearch Service.

In this post, we will cover:

  1. Support for Elasticsearch 5
  2. Security features
  3. Bug fixes and other minor features
  4. Project modernization
  5. Upgrading
  6. Roadmap
  7. Contributing

1. Support for Elasticsearch 5

In version 0.8.0, we used to provide two artifacts: one for Elasticsearch 1.x and one for Elasticsearch 2.x, both supporting the REST API as well as the Transport API.

By contrast, version 0.9.0 is split into three artifacts:

You’ll notice that support for the Transport API for Elasticsearch 1.x has been dropped in this release. Maintaining three different artifacts for the Transport API was becoming a major effort.

Furthermore, as Elasticsearch is planning on phasing out the transport API, we too will slowly be winding down our efforts around the Transport API. For the discussion on removing Transport API support altogether, see issue 53.

2. Security features

This release also brings two important features regarding security: SSL support as well as the ability to sign AWS requests when using Amazon Elasticsearch service. Of course, both of these features only affect the HTTP API.

2.1 HTTPS support

You’ll now be able to use TLS when using the HTTP API, with the following configuration parameter:

elasticsearch {
  client {
    ssl: true
  }
}

HTTPS support was contributed by Simon Frid, many thanks Simon!

2.2 Signing AWS requests

This release also adds the ability to sign your HTTP requests to Amazon Elasticsearch Service, per the process described in the AWS documentation. To enable this feature, you will have to modify your configuration file as follows:

elasticsearch {
  aws {
    signing: true
  }
}

3. Bug fixes

3.1 Buffer size

In Elasticsearch Loader, records from Kinesis are buffered before being sent off to Elasticsearch. Prior to 0.9.0, the situation where a record size was bigger than the size of the whole buffer was mishandled and would throw an exception. This has now been fixed, thanks to Adam Gray.

3.2 Maximum payload size

Another issue, only affecting old (1.x) versions of Elasticsearch, was that payloads were limited in size to 32 kB, meaning that a record exceeding this size wouldn’t be accepted by Elasticsearch.

As a result, this rejected record would end up in the bad Kinesis stream where errors are recorded. However, if that same record also exceeded the maximum size of a Kinesis record (1 MB), then it would be silently lost.

We fixed this issue by truncating the payload contained in the error report sent to the bad Kinesis stream.

4. Project modernization

As with our Kinesis S3 0.5.0 release, we took advantage of this release to overhaul the Elasticsearch Loader project by:

We also took this opportunity to move this codebase out of the core snowplow/snowplow repo, into a dedicated snowplow/snowplow-elasticsearch-loader.

5. Upgrading

A couple of changes have been made to how you run the Snowplwo Elasticsearch Loader.

5.1 Non-executable JARs

From now on, the produced artifacts will be non-executable JAR files. We found that sbt-assembly, the plugin we use to build fat JARs, was producing executable but unfortunately corrupt JAR files.

As a result, you’ll now have to launch the loader like so:

java -jar snowplow-elasticsearch-loader-http-0.9.0.jar --config my.conf

5.2 Configuration changes

There have been quite a few changes made to the configuration expected by the Elasticsearch Loader. Please check out the example configuration file in the repository to make sure your configuration file has the expected parameters.

6. Roadmap

As mentioned earlier in this post, we’re planning on phasing out the support for the Transport API - see issue 53 for more details.

Additionally, we’d like to extend the scope of this project by having the ability to write non-Snowplow events to Elasticsearch as plain JSONs or Avro records.

We are overhauling Snowplow Mini to use NSQ instead of Unix named pipes internally; as part of this we will need to extend Elasticsearch Loader to be able to read events from NSQ (see issue 59).

Finally, we’re also keen on being able to read events from Apache Kafka in addition to Kinesis, perhaps leveraging the Kafka Connect project for Elasticsearch; this is discussed in issue 18.

7. Contributing

You can check out the repository and the open issues if you’d like to get involved!

Thoughts or questions? Come join us in our Discourse forum!

Ben Fradet

Ben is a data engineer at Snowplow. You can find him on GitHub, Twitter and LinkedIn.