Kinesis Tee 0.1.0 released for Kinesis stream filtering and transformation
We are pleased to announce the release of version 0.1.0 of Kinesis Tee.
Kinesis Tee is like Unix tee, but for Kinesis streams. You can use it to:
- Write a Kinesis stream to another Kinesis stream (in the same region, or a different AWS account/region)
- Transform the format of a Kinesis stream
In the rest of this post we will cover:
- Introducing Kinesis Tee
- Example: mirroring a Kinesis stream to another account
- Example: converting Snowplow enriched events to nested JSON
- Example: filtering records in real time
- Getting help
1. Introducing Kinesis Tee
The core purpose of Kinesis Tee is to connect two or more Kinesis streams together. These streams can be in different regions, or even located in different AWS billable accounts and regions.
Kinesis Tee is an AWS Lambda function that is triggered when events are received in the Kinesis source stream.
When traffic in your Kinesis source stream triggers Kinesis Tee, the Lambda function looks up a configuration file in DynamoDB (See: Configuration) and uses it to determine the action to take. This configuration is a self-describing Avro containing:
- A single sink stream to write records to
- An optional stream transformer to convert the records to another supported format
- An optional steam filter to determine whether to write the records to the sink stream
The stream transformer section of the configuration gives you the ability to modify the records as they are passed from the source stream to the sink stream. Currently, only converting Snowplow Enriched Events (TSV format) to (nested) JSON is supported. This is done using the Snowplow Scala Analytics SDK.
Let’s go through some brief examples showing the power of Kinesis Tee.
2. Example: mirroring a Kinesis stream to another account
A common use for Kinesis Tee is mirroring a Kinesis stream to another AWS account, perhaps in another region.
In order to use Kinesis Tee as a pass-through (no filter/record changes) to another account, the following configuration can be used:
<<ADD HERE>> in the above example with your AWS credentials.
3. Example: converting Snowplow enriched events to nested JSON
Built in to Kinesis Tee is a “Snowplow to nested JSON” transformer. This converts Snowplow enriched events (TSV) into (nested) JSON using the Snowplow Scala Analytics SDK. Here’s an example:
Currently only the
SNOWPLOW_TO_NESTED_JSON transformation, or
4. Example: filtering records in real time
Imagine we have a Kinesis stream consisting of JSON monitoring events emitted by all the machines on a factory floor. The events look like this:
We want to pass on events indicating that a machine is overheating rapidly to a dedicated Kinesis stream:
filter, and in this case decodes to the following:
This filter will only send a record from the Kinesis source stream into the sink stream if:
- The record is parseable as JSON
- The record contains numeric
- The difference between
lastTempis greater than 10 degrees
All other records will be silently discarded by Kinesis Tee.
Kinesis Tee uses Snowplow community member Jorge Bastida’s excellent Gordon project for deployment. Find out more on setting up Kinesis Tee, plus more configuration examples, in:
If you want to understand the architecture of Kinesis Tee, perhaps with a view to contributing to the codebase, check out this guide:
We see Kinesis Tee as a fundamental building block for assembling asynchronous micro-service architectures on top of Kinesis. We have big plans for Kinesis Tee, including but not limited to:
- Adding daisychaining of transformation and filtering steps - allowing for multiple transformations, and filters based on each transformation (issue #18)
- Adding routing, so that a given event can end up in one or more target streams (issue #8)
- Adding better error handling, so that processing failures can be properly captured in a “bad stream” (issue #11)