NSQ is a realtime distributed messaging platform - think of it as a highly-scalable pub/sub system. We are planning on migrating Snowplow Mini to use NSQ under the hood, and so this new functionality is a stepping stone towards this goal.
At the moment, Snowplow Mini uses named Unix pipes “under the hood” for communicating between the various Snowplow components. This is an opaque and fairly brittle process - leading to unexpected behaviours such as backpressure issues and race conditions when launching. Switching Snowplow Mini to use NSQ is a good compromise: much simpler to setup than Kafka or Kinesis, but much more predictable than named Unix pipes.
Additionally, being highly scalable and relatively low-cost may make NSQ an important alternative to Kafka or Kinesis for some large-scale Snowplow roll-outs, particularly around the IoT space.
Adding NSQ support in Snowplow translates to:
We will detail both those steps below, but first let’s setup NSQ.
The easiest way to spin up NSQ is through the NSQ quick start. For our purposes, we only need
nsqlookupd is a component dedicated to managing who produces and consumes what.
nsqd, on the other hand, is in charge of receiving, queueing and delivering messages.
After starting both
nsqd, you can send the following
POST requests in order to create the NSQ topics that we will use later on.
Assuming all these commands run without error, you are ready to continue with the next steps.
This release brings support for a new sink target for our Scala Stream Collector, in the form of a NSQ topic. This feature maps one-to-one in functionality with the current Kinesis and Kafka offerings.
If you have followed the setting up NSQ section you would need to update your Scala Stream Collector configuration to the following:
Launching the collector in this configuration will then start sinking raw events to your configured NSQ topic, allowing them to be picked up and consumed by other applications, including Stream Enrich.
Stream Enrich has also been updated to support a NSQ topic as a source, and another one as a sink. Again, this feature maps one-to-one in functionality with the current Kinesis and Kafka offerings. If you are familiar with our Kinesis or Kafka support, you know the drill!
Following on from the Stream Collector section above, you can then configure your Stream Enrich application like so:
Events from the Stream Collector’s raw topic will then start to be picked up and enriched before being written to the
As we previously mentioned, the primary purpose of the NSQ support is Snowplow Mini’s migration. In support of that, we have already added NSQ support to the Elasticsearch Loader and S3 Loader.
You can find more detailed information about these versions in the ElasticSearch Loader 0.10.0 and the S3 Loader 0.6.0 blog posts.
The real-time applications for R96 Zeugma are available at the following locations:
To use NSQ, you will need to make the changes to the configurations of the Stream Collector and Stream Enrich as specified in the above sections to use NSQ.
If you are already using Kafka or Kinesis: there are no breaking changes in the R96 confguration for Stream Enrich, but you will need to update your Scala Stream Collector’s configuration. This is because only one sink configuration is needed from now on.
For example, if you’re using Kinesis only the Kinesis configuration will be needed:
Finally, an upcoming release of the Snowplow Docker images will include images for both the Scala Stream Collector and Stream Enrich with NSQ support.
There are no material non-NSQ-related changes in R96.
Upcoming Snowplow releases will include:
And of course, please stay tuned for the Snowplow Mini 0.4.0 release with NSQ support!
For more details on this release, please check out the release notes on Github.
If you have any questions or run into any problems, please visit our Discourse forum.