The AWS Lambda can help you jumpstart your own real-time event processing pipeline, without having to setup and manage clusters of server infrastructure. We will take you through the steps to get this simple analytics-on-write job setup and processing your Kinesis event stream.
Read on after the fold for:
AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information. AWS Lambda starts running your code within milliseconds of an event such as an image upload, in-app activity, website click, or output from a connected device. You can also use AWS Lambda to create new back-end services where compute resources are automatically triggered based on custom requests.
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. In this project we leverage the integration between the Kinesis and Lambda services.
This is an example of the “pull” model where AWS Lambda polls the Amazon Kinesis stream and invokes your Lambda function when it detects new data on the stream.
Our AWS Lambda reads a Kinesis stream containing events in a JSON format:
Our Node.js Lambda counts the events by
type and aggregates these counts into 1 minute buckets. The job then takes these aggregates and saves them into a table in DynamoDB:
The most complete open-source example of an analytics-on-write implementation is Ian Meyers’ amazon-kinesis-aggregators project; our example project is in turn heavily influenced by the concepts in Ian’s work. Three important concepts to understand in analytics-on-write are:
In this tutorial, we’ll walk through the process of getting up and running with Amazon Kinesis and AWS Lambda Service. You will need git, Vagrant and VirtualBox installed locally. This project is specifically configured to run in AWS region “us-east-1” to ensure all AWS services are available.
First clone the repo and bring up Vagrant:
Before we go any further we will have to set up our project enviroment. We will install Grunt and project dependencies with the commands below:
You’re going to need IAM-based credentials for AWS. Get your keys and type in “aws configure” in the Vagrant box (the guest). In the below, I’m also setting the region to “us-east-1” and output formaat to “json”:
We’re going to set up a DynamoDB table, IAM role (via CloudFormation), and a Kinesis stream. We will be using Grunt to run all of our tasks. I’m using “my-table” as the table name. The CloudFormation stack name is “kinesisDynamo” and the Kinesis stream name is “my-stream”. We will kick off all of the tasks with the
grunt init command:
Grunt can also package our project’s code into
dist/aws-lambda-example-project_0-1-0_latest.zip; this task also attaches the IAM role to AWS Lambda.
Invoke the task with:
Deploy this project to Lambda with the
grunt deploy command:
Our Lambda function reads incoming event data and logs some of the information to Amazon CloudWatch. AWS Lambda polls the Amazon Kinesis stream and invokes your Lambda function when it detects new data on the stream. We need to “connect” or “associate” our Lambda function to Kinesis by:
The final step for testing this project is to start sending some events to our new Kinesis stream. We have created a helper method to do this - run the below and leave it running in a separate terminal:
Success! You can now see data being written to the table in DynamoDB. Make sure you are in the correct AWS region, then click on
my-table and hit the
Explore Table button:
For each Timestamp and EventType pair, we see a Count, plus some CreatedAt and UpdatedAt metadata for debugging purposes. Our bucket size is 1 minute, and we have 5 discrete event types, hence the matrix of rows that we see.
Remember to shut off:
ProcessingKinesisLambdaDynamoDBfunction in AWS Lambda
cloudwatchlogs associated to the Lambda function
This is a short list of our most frequently asked questions.
I got credentials error running the
This project requires configuration of AWS credentials. Read more about ; configure your AWS credentials using the AWS CLI like so:
I found an issue with the project:
Feel free to get in touch or raise an issue on GitHub!
This example project is a direct port of our Spark Streaming Example Project - if you are interested in Spark Streaming or Scala, definitely check it out!
Both example projects are based on an event processing technique called analytics-on-write. We are planning on exploring these techniques further in a new project, called Icebucket. Stay tuned for more on this!