Huskimo 0.3.0 released: warehouse your Twilio telephony data in Redshift

Share

We are pleased to announce the release of Huskimo 0.3.0, for companies who use Twilio and would like to analyze their telephony data in Amazon Redshift, alongside their Snowplow event data.

For readers who missed our Huskimo introductory post: Huskimo is a new open-source project which connects to third-party SaaS platforms (Singular and now Twilio), exports their data via API, and then uploads that data into your Redshift instance.

Huskimo is a complement to Snowplow’s built-in webhook support – Huskimo exists because not all SaaS services offer webhooks which expose their internal data as a stream of events. Note that you do not need to use Snowplow to use Huskimo.

Read on after the jump for:

  1. Twilio support
  2. Other updates
  3. Running Huskimo
  4. Getting help
  5. Huskimo roadmap

1. Twilio support

Overview

Twilio is a cloud telephony service, used by many thousands of companies to develop and operate call, voicemail and texting systems.

Version 0.3.0 of Huskimo supports five resources made available through the Twilio API. These are set out in the following table:

Resource Fetch Table
Calls New based on StartTime twilio_calls
Messages New based on DateSent twilio_recordings
Recordings New based on DateCreated twilio_messages
IncomingPhoneNumbers All twilio_incoming_phone_numbers
pricing.PhoneNumbers All twilio_pricing_phone_numbers

For each resource type, Huskimo will retrieve records from the Twilio RESTful API, convert them into a simple TSV file format, and load them into Redshift. Note that we do not extract any sub-resources from these Twilio resources, so there are no child tables for any of these five resources in Redshift.

In the next section we will explain the algorithm used by Huskimo to extract this data from Twilio.

Extraction algorithm

Every time Huskimo runs to extract data from Twilio, it should:

  1. Connect to Twilio using the credentials in the configuration file
  2. For IncomingPhoneNumbers, fetch all data from that resource
  3. For Calls, Messages and Recordings, fetch all data for the given day
  4. For pricing.PhoneNumbers, fetch all data using a bespoke algorithm
  5. Upload the Twilio usage data into each Amazon Redshift database specified in the configuration file

Note that this behavior is different from how Huskimo extracts data for Singular: because marketing data is difficult to finalize, Huskimo fetches spend data from Singular for each of the past N days (the default is 30), every time Huskimo runs. By contrast we treat Twilio’s telephony data as “golden” as soon as it is available, and so there is no equivalent “lookback” approach for Huskimo’s treatment of Twilio.

Algorithm for pricing.PhoneNumbers

The algorithm for retrieving pricing.PhoneNumbers from Twilio is as follows:

  1. Do a GET to pricing.twilio.com/v1/PhoneNumbers/Countries to get a list of Twilio’s countries
  2. Loop through each country returned and do a GET to pricing.twilio.com/v1/PhoneNumbers/Countries/{Country}
  3. Flatten each entry in the phone_number_prices array returned into its own row in the output table

Thus the output table in Redshift table twilio_pricing_phone_numbers looks like this:

Estonia EE mobile 3.00 3.00 usd
Estonia EE national 1.00 1.00 usd
Estonia EE local 0.50 0.50 usd

2. Other updates

The following general fixes have been applied:

We have also made some updates to Huskimo’s Singular support:

3. Running Huskimo

Running Huskimo consists of four steps:

  1. Install Huskimo
  2. Write the Hukimo config file
  3. Deploy the Redshift tables
  4. Schedule Huskimo to run nightly

We’ll cover each of these steps briefly in the next section.

Install Huskimo

Huskimo is made available as an executable “fatjar” runnable on any Linux system. It is hosted on Bintray, download it like so:

$ wget http://dl.bintray.com/snowplow/snowplow-generic/huskimo_0.3.0.zip

Once downloaded, unzip it:

$ unzip huskimo_0.3.0.zip

Assuming you have a recent (Java 7 or 8) runtime on your system, running is as simple as:

$ ./huskimo --config my-config.yml

Write the Huskimo config file

Huskimo is configured using a YAML-format file which looks like this:

fetch: lookback: 30 # Number of days back in time from the start date to fetch for channels: - name: ADD HERE type: singular api_user: # Leave blank for Singular api_key: ADD HERE - name: ADD HERE type: twilio api_user: ADD HERE # Twilio account sid api_key: ADD HERE # Twilio auth token s3: access_key_id: ADD HERE secret_access_key: ADD HERE region: ADD HERE # Region bucket lives in bucket: ADD HERE # Must be s3:// not s3n:// for Redshift folder_path: ADD HERE targets: - name: ADD HERE type: redshift # Only Redshift support currently host: ADD HERE # The endpoint as shown in the Redshift console database: ADD HERE # Name of database port: 5439 # Default Redshift port ssl: false # SSL disabled by default ssl_factory: org.postgresql.ssl.NonValidatingFactory # Disable SSL certificate validation by default table: ADD HERE # TODO: move this out username: ADD HERE password: ADD HERE

Key things to note:

If you are upgrading from version 0.2.0 note the new fields:

  1. Under channels, api_user (leave blank for Singular)
  2. Under targets, the new fields ssl and ssl_factory to support the SSL security setting on Redshift databases

Deploy the Redshift tables

Before starting Huskimo you must deploy the relevant tables into Redshift. You can find the shared database setup in the file:

sql/common-redshift-ddl.sql

If you are extracting data from Twilio, run this script:

sql/twilio-redshift-ddl.sql

If you are extracting data from Singular, run this script:

sql/singular-redshift-ddl.sql

Make sure to deploy this file against each Redshift database you want to load Singular or Twilio data into.

Schedule Huskimo to run nightly

You are now ready to schedule Huskimo to run daily.

We typically run Huskimo in the early morning so that the data for yesterday is already available (even if rather incomplete). A cron entry for Huskimo might look something like this:

30 4 * * * /opt/huskimo/huskimo-0.3.0 --config /etc/huskimo.yml

4. Getting help

For more details on this release, please check out the Huskimo 0.3.0 on GitHub.

We will be building a dedicated wiki for Huskimo to support its usage; in the meantime, if you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.

5. Huskimo roadmap

We will be adding support for further SaaS platforms to Huskimo on a case-by-case basis.

We are particularly interested in adding support for more marketing channels, such as Google AdWords or Facebook. Having these datasets available in Redshift alongside your event data should enable some very powerful marketing attribution and return-on-spend analytics.

If you are interested in sponsoring a new integration for Huskimo, do get in touch!

Share

Related articles