Compared to the colourful entities and verbs, event context looked like simply “the intangible everything else” - the amorphous whitespace around our core event. Nothing could be further from the truth - event context is in fact tangible, easily recorded and hugely valuable for analysis.
Simply put, event context describes the environment and manner in which an event took place. There are strong parallels between our view of event context and the adverbial concept of “time, manner, place” used in human language to describe events. Context is used to describe, among other things:
In fact a large proportion of Snowplow’s existing Canonical Event Model is describing the context of the given event - as we will explore in the next section.
Today, our Canonical Event Model contains 98 fields, and by our reckoning 57 of those fields solely relate to event context in one form or another. Here is a brief summary of the context already present:
|Context category||Example fields||Description|
|Temporal||When this event took place|
|Geographical||Where in the real-world this event took place|
|Environmental||The computing environment in which this event took place|
|Narratorial||Who narrated this event (our data pipeline)|
|Antecedental||What occurred prior to (and potentially caused) this event|
As you can see, our Canonical Event Model is chock full of context! But not all of this context is created equal - in the next section we will explore where context comes from, and how reliable it is.
It might be natural to assume that all event context is captured at the point of creating (“tracking” in Snowplow language) an event. In fact things are not that simple - but we can think of context coming from three distinct sources:
To dive into a couple of examples:
Primary and secondary timestamps
For a simple comparison between primary and secondary context, consider our two event timestamps:
Both of these are pieces of temporal context, but they originate from different places. And interestingly, they have very different reliability profiles and thus use-cases:
dvce_tstampis set by the client’s system clock, which is frequently incorrect
collector_tstampis set by the collector’s server clock, which is accurate but can only record when the event was collected, not created
Thus for absolute analyses across multiple users,
collector_tstamp provides the best temporal information. However, this introduces some uncertainty around the ordering of events which happened close together, so for e.g. a funnel analysis tied to a specific user,
dvce_tstamp would be the best temporal context.
Derived geographical and meteorological context
Additionally, it is possible to derive new context from one or more pieces of existing context. Here is an illustration of this:
As you can see here, we collect
collector_tstamp as pieces of secondary context in the collector. Then in the Enrichment phase, we are able to derive a new set of geographical context (
geo_longitude etc) by performing a MaxMind geo-IP lookup on the user’s
To push this example further: we could potentially then use the
geo_longitude to derive weather context from that information. This is not an Enrichment currently supported by Snowplow, but it is a great example of a second-order derived context.
There’s one more complexity I’d like to discuss before wrapping up, which could be summed up by:
One event’s context is another event’s object (or subject or…)
Let’s demonstrate this by comparing two events. In the first, a customer is viewing a web page:
In the second event, the customer is now adding an item to their basket:
But crucially, in the second event, the customer is still on a web page. This web page is no longer the direct object of the event - but it is still relevant information: it gives us spatial context, on where the event took place.
Thus we can see that one event’s direct object becomes context for another event; in both events, we are modelling some kind of
web_page entity, but it serves different roles in both events.
We introduced a closely-related concept in our blog post Towards Universal Event Analytics with talk of prepositional objects:
the first player (Subject) kills (Verb) the second player (Direct Object) using a nailgun (Prepositional Object)
As we evolve our event grammar further, we will need to consider whether prepositional objects and context should be treated separately, or merged into one broader concept.
Event context provides a huge amount of valuable metadata around our core subject-verb-object grammar, as evidenced by the large proportion of our Canonical Event Model which is given over to context.
If the core subject-verb-object dynamic tells us who did what to whom, then context tells us something much more discursive and subjective, but much richer too: how was it done, where was it done, why was it done. We are hugely excited about all forms of context at Snowplow, and we believe there is much more context still to be collected, not least:
But the challenge of context is not just to accrete as much context as possible - we also want to work to structure and schema the context we already have better. Currently Snowplow stores contextual information in our “fat” Redshift table using a simple namespacing approach. Rest assured that we are exploring cleaner ways of storing this information as part of our wider event grammar research!