Read on below the jump for:
Snowplow’s ETL process outputs enriched events in a TSV. This TSV currently has 131 fields, which can make it difficult to work with directly. The Snowplow Python Analytics SDK currently supports one transformation: turning this TSV into a more tractable JSON.
The transformation algorithm used to do this is the same as the one used in the Kinesis Elasticsearch Sink and the Snowplow Scala Analytics SDK, with one exception: when a field of the input TSV is empty, we leave that field out of the output JSON entirely rather than using a field with the value
null. Here is an example output JSON:
There are special rules for how custom contexts and unstructured events are added to the JSON. For example, if an enriched event contained a
com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1 unstructured event, then the final JSON would contain:
For more examples and detail on the algorithm used, check out the Kinesis Elasticsearch Sink wiki page.
The SDK is available on PyPI:
Use the SDK like this:
If there are any problems in the input TSV (such as unparseable JSON fields or numeric fields), the
transform method will throw a
SnowplowEventTransformationException. This exception contains a list of error messages - one for every problematic field in the input.
For more information, check out the Python Analytics SDK wiki page.
If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.
And if there’s another Snowplow Analytics SDK you’d like us to prioritize creating, please let us know on the forums!