For readers who missed our Huskimo introductory post: Huskimo is a new open-source project which connects to third-party SaaS platforms (Singular and now Twilio), exports their data via API, and then uploads that data into your Redshift instance.
Huskimo is a complement to Snowplow’s built-in webhook support – Huskimo exists because not all SaaS services offer webhooks which expose their internal data as a stream of events. Note that you do not need to use Snowplow to use Huskimo.
Read on after the jump for:
1. Twilio support
Twilio is a cloud telephony service, used by many thousands of companies to develop and operate call, voicemail and texting systems.
Version 0.3.0 of Huskimo supports five resources made available through the Twilio API. These are set out in the following table:
|Calls||New based on
|Messages||New based on
|Recordings||New based on
For each resource type, Huskimo will retrieve records from the Twilio RESTful API, convert them into a simple TSV file format, and load them into Redshift. Note that we do not extract any sub-resources from these Twilio resources, so there are no child tables for any of these five resources in Redshift.
In the next section we will explain the algorithm used by Huskimo to extract this data from Twilio.
Every time Huskimo runs to extract data from Twilio, it should:
- Connect to Twilio using the credentials in the configuration file
- For IncomingPhoneNumbers, fetch all data from that resource
- For Calls, Messages and Recordings, fetch all data for the given day
- For pricing.PhoneNumbers, fetch all data using a bespoke algorithm
- Upload the Twilio usage data into each Amazon Redshift database specified in the configuration file
Note that this behavior is different from how Huskimo extracts data for Singular: because marketing data is difficult to finalize, Huskimo fetches spend data from Singular for each of the past N days (the default is 30), every time Huskimo runs. By contrast we treat Twilio’s telephony data as “golden” as soon as it is available, and so there is no equivalent “lookback” approach for Huskimo’s treatment of Twilio.
Algorithm for pricing.PhoneNumbers
The algorithm for retrieving pricing.PhoneNumbers from Twilio is as follows:
- Do a
pricing.twilio.com/v1/PhoneNumbers/Countriesto get a list of Twilio’s countries
- Loop through each country returned and do a
- Flatten each entry in the
phone_number_pricesarray returned into its own row in the output table
Thus the output table in Redshift table
twilio_pricing_phone_numbers looks like this:
The following general fixes have been applied:
- We have added support for SSL-secured Redshift databases (#21)
- We have fixed a bug in
deleteFromS3where only the first 1000 files were deleted (#18)
- We split
We have also made some updates to Huskimo’s Singular support:
- Huskimo now allows
- We fixed macros in fetchAndWrite’s Exception (#16)
- Singular now only fetches channels of type
- We partially fixed an issue where Akka prevents clean exit on Exception (#1) – the remainder of the fix should come in 0.3.1 (#24)
Running Huskimo consists of four steps:
- Install Huskimo
- Write the Hukimo config file
- Deploy the Redshift tables
- Schedule Huskimo to run nightly
We’ll cover each of these steps briefly in the next section.
Huskimo is made available as an executable “fatjar” runnable on any Linux system. It is hosted on Bintray, download it like so:
Once downloaded, unzip it:
Assuming you have a recent (Java 7 or 8) runtime on your system, running is as simple as:
Write the Huskimo config file
Huskimo is configured using a YAML-format file which looks like this:
Key things to note:
- You can configure Huskimo to extract from one or more Singular or Twilio accounts
- You can configure Huskimo to write the extracted data to one or more Redshift databases
- Huskimo requires Amazon S3 details to power the
If you are upgrading from version 0.2.0 note the new fields:
api_user(leave blank for Singular)
targets, the new fields
ssl_factoryto support the SSL security setting on Redshift databases
Deploy the Redshift tables
Before starting Huskimo you must deploy the relevant tables into Redshift. You can find the shared database setup in the file:
If you are extracting data from Twilio, run this script:
If you are extracting data from Singular, run this script:
Make sure to deploy this file against each Redshift database you want to load Singular or Twilio data into.
Schedule Huskimo to run nightly
You are now ready to schedule Huskimo to run daily.
We typically run Huskimo in the early morning so that the data for yesterday is already available (even if rather incomplete). A cron entry for Huskimo might look something like this:
For more details on this release, please check out the Huskimo 0.3.0 on GitHub.
We will be building a dedicated wiki for Huskimo to support its usage; in the meantime, if you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.
We will be adding support for further SaaS platforms to Huskimo on a case-by-case basis.
We are particularly interested in adding support for more marketing channels, such as Google AdWords or Facebook. Having these datasets available in Redshift alongside your event data should enable some very powerful marketing attribution and return-on-spend analytics.
If you are interested in sponsoring a new integration for Huskimo, do get in touch!