[Tutorial] Adding what3words reverse geocoding data to Snowplow enriched events

27 February 2019  •  Dilyan Damyanov

Snowplow’s API Request Enrichment lets us add dimensions to an incoming Snowplow event using an internal or external HTTP-based API.

In this tutorial, we’ll look at how you can use the enrichment to add a ‘reverse geocoding context’ by doing a lookup against what3words.com’s API.

What is what3words?

what3words has divided the world into a grid of 3m x 3m squares and assigned each one a unique 3 word address. (The main Snowplow London offices for example are at rise.heavy.last.) The service is especially useful in regions with less well developed postal and addressing systems, as it allows locations to be identified and shared precisely and easily. There’s a free mobile app and an online map, and w3w can also be built into any other app, platform, or website.

The reverse geocoding API resolves coordinates, expressed as latitude and longitude, to a 3 word address.

Design considerations

Let’s start by considering what inputs we’ll need, what the API call needs to look like, and what the expected output from it will be.

Inputs

The reverse geocoding API expects to receive coordinates (a comma separated string of latitude and longitude), as well as an API key for authentication. Those are required parameters.

We will get the latitude and longitude data from the enriched event POJO. You will have to sign up with what3words for an API key.

Additionally, there are some optional requirements, some of which affect the format of the response. The enrichment assumes that the API returns a JSON, so we need to ensure that we set these appropriately. We should set format=json and not use the callback parameter.

It’s up to you if you want to use the full or terse display option. The schemas in this tutorial assume we’re using display=full.

The optional lang parameter does not affect the format of the response.

API call

We need to construct an API call that looks something like this:

GET https://api.what3words.com/v2/reverse?coords=51.521251,-0.203586&display=full&format=json&key=[API-KEY]

Outputs

The example JSON output looks like this:

{
    "crs": {
        "type": "link",
        "properties": {
            "href": "http://spatialreference.org/ref/epsg/4326/ogcwkt/",
            "type": "ogcwkt"
        }
    },
    "words": "index.home.raft",
    "bounds": {
        "southwest": {
            "lng": -0.203607,
            "lat": 51.521238
        },
        "northeast": {
            "lng": -0.203564,
            "lat": 51.521265
        }
    },
    "geometry": {
        "lng": -0.203586,
        "lat": 51.521251
    },
    "language": "en",
    "map": "http://w3w.co/index.home.raft",
    "status": {
        "code": 200,
        "message": "OK"
    },
    "thanks": "Thanks from all of us at index.home.raft for using a what3words API"
}

Implementation

Step 1: Sign up for a what3words API key

Head over to the what3words signup form and register for a Developer account. (It’s free.)

Once you log in, go to Developer API Keys > Manage Applications > Create New. Fill in a few details about how you’re going to use the API key and generate it.

Step 2: Write the schema for the new context

Will be adding a new context to our Snowplow events and we’ll need a self-describing JSON schema for that new context: com.what3words/reverse_geocoding_context/jsonschema/1-0-0.

We can use Schema Guru to generate a first draft of the schema from the example response provided by the w3w documentation. Save the example JSON output in a file (let’s call it response.json) and then generate the schema:

$ ./schema-guru-0.6.1 schema path/to/response.json --vendor com.what3words \
$ --name reverse_geocoding_context --no-length \
$ --output schemas/com.what3words/reverse_geocoding_context/jsonschema/1-0-0

Schemas derived from a single JSON instance can be too restrictive, which is why we’re using the --no-length option to remove min and max bounds for strings. After the schema has been generated, you may want to make it even more permissive. Some common changes include making all string fields nullable (in case there are missing values) and setting additionalProperties to true, to ensure the events will pass validation if w3w adds new fields to the response.

Here’s an example draft created with Schema Guru and then modified by hand:

{
  "$schema" : "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "self" : {
    "vendor" : "com.what3words",
    "name" : "reverse_geocoding_context",
    "version" : "1-0-0",
    "format" : "jsonschema"
  },
  "type" : "object",
  "properties" : {
    "map" : {
      "type" : "string",
      "format" : "uri"
    },
    "thanks" : {
      "type" : "string"
    },
    "bounds" : {
      "type" : "object",
      "properties" : {
        "southwest" : {
          "type" : "object",
          "properties" : {
            "lng" : {
              "type" : "number"
            },
            "lat" : {
              "type" : "number"
            }
          },
          "additionalProperties" : true
        },
        "northeast" : {
          "type" : "object",
          "properties" : {
            "lng" : {
              "type" : "number"
            },
            "lat" : {
              "type" : "number"
            }
          },
          "additionalProperties" : true
        }
      },
      "additionalProperties" : true
    },
    "language" : {
      "type" : "string"
    },
    "status" : {
      "type" : "object",
      "properties" : {
        "code" : {
          "type" : "integer"
        },
        "message" : {
          "type" : "string"
        }
      },
      "additionalProperties" : true
    },
    "words" : {
      "type" : "string"
    },
    "geometry" : {
      "type" : "object",
      "properties" : {
        "lng" : {
          "type" : "number"
        },
        "lat" : {
          "type" : "number"
        }
      },
      "additionalProperties" : true
    },
    "crs" : {
      "type" : "object",
      "properties" : {
        "type" : {
          "type" : "string"
        },
        "properties" : {
          "type" : "object",
          "properties" : {
            "href" : {
              "type" : "string",
              "format" : "uri"
            },
            "type" : {
              "type" : "string"
            }
          },
          "additionalProperties" : true
        }
      },
      "additionalProperties" : true
    }
  },
  "additionalProperties" : true
}

You can also use Schema Guru to generate Redshift DDLs and JSONpath files, or – if you’re more familiar with it and / or are working off a schema you wrote from scratch – you can use Igluctl to do the same. If you’re using a different storage target, such as BigQuery or Snowflake, you don’t need to worry about the DDL: the respective loader apps in the pipeline will figure it out.

Step 3: Write the enrichment configuration

Next up, we need to write the JSON config file for the enrichment. This file should be called api_request_enrichment_config.json and it should be placed in the folder where all the rest of your enrichment configurations live, so it’s accessible to the pipeline. The API Request Enrichment documentation page has a link to the JSON schema for the enrichment config file, a detailed example, and more information on how things work under the hood.

We ultimately want to end up with a file that looks like this:

{
  "schema": "iglu:com.snowplowanalytics.snowplow.enrichments/api_request_enrichment_config/jsonschema/1-0-0",
  "data": {
    "vendor": "com.snowplowanalytics.snowplow.enrichments",
    "name": "api_request_enrichment_config",
    "enabled": true,
    "parameters": {
      "inputs": [
        {
          "key": "lat",
          "pojo": {
            "field": "geo_latitude"
          }
        },
        {
          "key": "lng",
          "pojo": {
            "field": "geo_longitude"
          }
        }
      ],
      "api": {
        "http": {
          "method": "GET",
          "uri": "https://api.what3words.com/v2/reverse?coords=,&display=full&format=json&key=API-KEY",
          "timeout": 2000,
          "authentication": {
            "httpBasic": {
              "username": "",
              "password": ""
            }
          }
        }
      },
      "outputs": [
        {
          "schema": "iglu:com.what3words/reverse_geocoding_context/jsonschema/1-0-0" ,
          "json": {
            "jsonPath": "$"
          }
        }
      ],
      "cache": {
        "size": 3000,
        "ttl": 60
      }
    }
  }
}

Let’s look at the different parameters in turn.

inputs

We need two data points from the raw event POJO: geo_latitude and geo_longitude. We assign the values found there to the lat and lng keys, respectively.

api

We then use the lat and lng keys to refer to the geo_latitude and geo_longitude values in the API call. The values extracted from the raw event will be substituted in the URI before the GET request is submitted.

outputs

The API responds with JSON, which matches out custom reverse_geocoding_context schema, so we take on all of the data from the response, by specifying "jsonPath": "$".

cache

A heavy-traffic pipeline might generate millions of calls to the specified API endpoint in a very short period of time. We can use this section to set some reasonable limits for the cache size and ‘time to live’.

Step 4: Test and deploy

We can now use Snowplow Mini to test our new enrichment setup. Refer to the Setup Guide for details on how to set up Snowplow Mini on AWS or GCP. The Usage Guide is the best resource on how it works once set up. We have to add the enrichment as well as upload the custom schema for it. We can also do the latter with Igluctl:

$ ./igluctl static push ./schemas/com.what3words/reverse_geocoding_context/jsonschema/1-0-0  $SNOWPLOW_MINI_IP/iglu-server/ $IGLU_REGISTRY_MASTER_KEY --public

When uploading new schemas and enrichment configurations, you might need to restart all services from the Control Plane of the Snowplow Mini console, to ensure cache is flushed. Then, we can test if the enrichment is working by sending some test events.

Finally, we can use the Kibana dashboard that comes bundled with Snowplow Mini to inspect the data and verify that the new context is being successfully attached:

kibana-screenshot

What else is possible with the API request enrichment?

This tutorial is an adaptation of the Integrating Clearbit data into Snowplow tutorial. We’re always interested to hear what other ways people have been using it in, so please share if you have a cool use case.