We are pleased to announce the release of our new AWS Lambda Node.js Example Project!
The AWS Lambda can help you jumpstart your own real-time event processing pipeline, without having to setup and manage clusters of server infrastructure. We will take you through the steps to get this simple analytics-on-write job setup and processing your Kinesis event stream.
Read on after the fold for:
- What is AWS Lambda and Kinesis?
- Introducing analytics-on-write
- Detailed setup
- Further reading
AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information. AWS Lambda starts running your code within milliseconds of an event such as an image upload, in-app activity, website click, or output from a connected device. You can also use AWS Lambda to create new back-end services where compute resources are automatically triggered based on custom requests.
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. In this project we leverage the integration between the Kinesis and Lambda services.
This is an example of the “pull” model where AWS Lambda polls the Amazon Kinesis stream and invokes your Lambda function when it detects new data on the stream.
Our AWS Lambda reads a Kinesis stream containing events in a JSON format:
Our Node.js Lambda counts the events by
type and aggregates these counts into 1 minute buckets. The job then takes these aggregates and saves them into a table in DynamoDB:
The most complete open-source example of an analytics-on-write implementation is Ian Meyers’ amazon-kinesis-aggregators project; our example project is in turn heavily influenced by the concepts in Ian’s work. Three important concepts to understand in analytics-on-write are:
- Downsampling where we reduce the event’s ISO 8601 timestamp down to minute precision, so for instance “2015-06-05T12:54:43.064528” becomes “2015-06-05T12:54:00.000000”. This downsampling gives us a fast way of bucketing or aggregating events via this downsampled key
- Bucketing is an aggregation technique that builds buckets, where each bucket is associated with a downstampled timestamp key and an event type criterion. By the end of the aggregation process, we’ll end up with a list of buckets – each one with a countable set of events that “belong” to it
- Atomic Increment is useful for updating values as they change because multiple requests from your application won’t collide. If your application needs to increase a count by 100, you can just tell Amazon DynamoDB to automatically increment the count by 100 as opposed to having to get the record, increment the count, and put it back into Amazon DynamoDB
In this tutorial, we’ll walk through the process of getting up and running with Amazon Kinesis and AWS Lambda Service. You will need git, Vagrant and VirtualBox installed locally. This project is specifically configured to run in AWS region “us-east-1” to ensure all AWS services are available.
Step 1: Build the project
First clone the repo and bring up Vagrant:
Before we go any further we will have to set up our project enviroment. We will install Grunt and project dependencies with the commands below:
Step 2: Add AWS credentials to the vagrant box
You’re going to need IAM-based credentials for AWS. Get your keys and type in “aws configure” in the Vagrant box (the guest). In the below, I’m also setting the region to “us-east-1” and output formaat to “json”:
Step 3: Create your DynamoDB, IAM Role, and Kinesis stream
We’re going to set up a DynamoDB table, IAM role (via CloudFormation), and a Kinesis stream. We will be using Grunt to run all of our tasks. I’m using “my-table” as the table name. The CloudFormation stack name is “kinesisDynamo” and the Kinesis stream name is “my-stream”. We will kick off all of the tasks with the
grunt init command:
Step 4: Build Node.js project and configure AWS Lambda
Grunt can also package our project’s code into
dist/aws-lambda-example-project_0-1-0_latest.zip</code>; this task also attaches the IAM role to AWS Lambda.
Invoke the task with:
Step 5: Deploy the package to AWS Lambda
Deploy this project to Lambda with the
grunt deploy command:
Step 6: Associate our Kinesis stream to our Lambda
Our Lambda function reads incoming event data and logs some of the information to Amazon CloudWatch. AWS Lambda polls the Amazon Kinesis stream and invokes your Lambda function when it detects new data on the stream. We need to “connect” or “associate” our Lambda function to Kinesis by:
Step 7: Generate events in your Kinesis stream
The final step for testing this project is to start sending some events to our new Kinesis stream. We have created a helper method to do this – run the below and leave it running in a separate terminal:
Step 8: Inspect the “my-table” aggregates in DynamoDB
Success! You can now see data being written to the table in DynamoDB. Make sure you are in the correct AWS region, then click on
my-table and hit the
Explore Table button:
For each Timestamp and EventType pair, we see a Count, plus some CreatedAt and UpdatedAt metadata for debugging purposes. Our bucket size is 1 minute, and we have 5 discrete event types, hence the matrix of rows that we see.
Step 9: Shut everything down
Remember to shut off:
- grunt events loading script
- Delete your
- Delete your
- Delete your
- Delete your
ProcessingKinesisLambdaDynamoDBfunction in AWS Lambda
- Delete your
cloudwatchlogs associated to the Lambda function
- Exit your Vagrant guest
This is a short list of our most frequently asked questions.
I got credentials error running the
This project requires configuration of AWS credentials. Read more about <img src=”http://docs.aws.amazon.com/general/latest/gr/aws-access-keys-best-practices.html” alt=”AWS credentials” />; configure your AWS credentials using the AWS CLI like so:
I found an issue with the project:
This example project is a direct port of our Spark Streaming Example Project – if you are interested in Spark Streaming or Scala, definitely check it out!
Both example projects are based on an event processing technique called analytics-on-write. We are planning on exploring these techniques further in a new project, called Icebucket. Stay tuned for more on this!