Towards universal event analytics - building an event grammar

As we outgrow our “fat table” structure for Snowplow events in Redshift, we have been spending more time thinking about how we can model digital events in Snowplow in the most universal, flexible and future-proof way possible.

When we blogged about building out the Snowplow event model earlier this year, a comment left on that post by Loic Dias Da Silva made us realize that we were missing an even more fundamental point: defining a Snowplow event grammar to underpin our Snowplow event dictionary. Here is part of Loic’s excellent comment - although I would encourage you to read it in full on the blog post:

Hi, we’re also working on an event model for our global eventing platform but our events currently are more macro, inspired by RDF in a sense:

An Actor(id/type) made and Action(verb, context) on another Object(id/type).

Each Actor, Action and Object can hold k/v properties.

The context itself, owned by the action, is a k/v dictionary.

So in designing his event grammar, Loic was influenced by the Resource Description Framework, the W3C specifications for modelling relationships to web resources.

An event grammar inspired by RDF is certainly interesting, but I am using a much older, more sophisticated and more tested “event grammar” to write this sentence: the grammar of human language. Why not start, then, from the core grammar underpinning English, Latin, Greek, German and other languages to see just how far this approach can take us in modelling events in the digital world?

So, in the rest of this post we will:

  1. Introduce the components of our grammar
  2. Model some ecommerce events
  3. Model some videogame events
  4. Model some digital media events
  5. Discuss what we have learnt
  6. Draw some conclusions

1. The components of our grammar

All of the human languages mentioned above (and many, many others) share the same fundamental building blocks in their grammars for describing an event with a verb in the active voice:

grammar

To go through these in turn:

  • Subject, or noun in the nominative case. This is the entity which is carrying out the action: “I wrote a letter”
  • Verb, this describes the action being done by the Subject: “I wrote a letter”
  • Direct Object, or simply Object or noun in the accusative case. This is the entity to which the action is being done: “I wrote a letter
  • Indirect Object, or noun in the dative case. A slightly more tricky concept: this is the entity indirectly affected by the action: “I sent the letter to Tom
  • Prepositional Object. An object introduced by a preposition (in, for, of etc), but not the direct or indirect object: “I put the letter in an envelope”. In a language such as German, prepositional objects will be found in the accusative, dative or genitive case depending on the preposition used
  • Context. Not a grammatical term, but we will use context to describe the phrases of time, manner, place and so on which provide additional information about the action being performed: “I posted the letter on Tuesday from Boston

With these grammatical building blocks defined, let’s now put them through their paces modelling some digital events - starting with some online retail events:

2. Modelling some ecommerce events

Here are some ecommerce events mapped to our grammatical model:

ecomm1

In this event, a shopper (Subject) views (Verb) a t-shirt (Direct Object) while browsing an online store (Context).

ecomm2

Here we introduce an Indirect Object which has been affected by the event: the shopper (Subject) adds (Verb) a t-shirt (Direct Object) to her shopping basket (Indirect Object). Again, this is while browsing the online store (Context).

ecomm3

Here we have an Object introduced by preposition: the shopper (Subject) pays (Verb) for his order (Prepositional Object). This is all within the checkout flow (Context).

3. Modelling some videogame events

So far so good, but how well does this model work with events generated by a gaming session?

videogame1

In a gifting screen within the game (Context), the player (Subject) gifts (Verb) some gold (Direct Object) to another player (Indirect Object).

videogame2

During a two-player skirmish (Context), the first player (Subject) kills (Verb) the second player (Direct Object) using a nailgun (Prepositional Object). This illustrates how your end-users can be the Object of events, not just their Subjects.

videogame3

Here we illustrate a reflexive verb: through grinding (Context), the player (Subject) levels herself up (Verb, reflexive). A reflexive Verb is one where the Subject and the Object are the same.

4. Modelling some digital media events

This seems to be working well! Finally, let’s map our new event grammar onto the world of digital media and publishing:

media1

While consuming media on your site (Context), a user (Subject) reads (Verb) an article (Direct Object).

media2

Wanting to share content socially (Context), a user (Subject) shares (Verb) a video (Direct Object) on Twitter (Prepositional Object). Also note that Twitter here is a proper noun (not a common noun).

media3

Working from the moderation UI (Context), an administrator (Subject) bans (Verb) user #23 (Direct Object). This illustrates how an end-user can be the Object of an event, and how someone other than an end-user can be the Subject of the event.

5. What have we learnt

As you can see, it is relatively straightforward to map any of the digital events above into these six “slots” of: Subject, Verb, Object, Indirect Object, Prepositional Object and Context. This is unsurprising: our core grammar has been unambiguously describing events in many different human languages across thousands of years.

Going through the above exercise, several further things have become clear to us that we will want to factor into the Snowplow event grammar going forwards:

Implicit Subjects are a mistake

Most web and event analytics systems make the mistake of making the Subject of the event implicit:

(End user) adds product to basket
(Admin) bans user #23

This is a mistake, because as we have seen above, expressing the Subject is a key component of our event grammar.

Going further, it is particularly dangerous to assume that the Subject of every event is your end-user or customer, because we have seen cases where this is not the case.

An entity can be Subject or Object or both across multiple events

As per these gaming examples:

User #1 gifts gold to user #2
User #2 kills user #3
User #2 levels up
Admin bans user #1

As we can see from this, the same entities will be found as Subject, Direct Object, Indirect Object or Prepositional Object depending on the event.

Most analytics systems miss the fact that an end-user (for example) is not merely the implicit Subject of multiple events, but is in fact an entity which is the Subject and the Object of different events.

We can keep our Verbs really simple

All of the events above were modelled simply using verbs in the active voice, not the passive voice:

  • Active voice: “I watch a video”
  • Passive voice: “the video was watched by Alex”

We don’t need to use passive voice for our event model, because we can always derive (if needed) a passive voice event from our active voice event.

Going further, Verbs conjugate in lots of other ways (tense, person, mood etc) - but again we don’t need to include any of this into our event model: all of this can be derived (if needed) from our event’s Context.

Context is king

Our idea of Context does not map cleanly onto a singular grammatical component, but it is just too useful to exclude. In fact, de facto we already have a rich web context for Snowplow events in our Canonical event model, including:

  • When the event occurred
  • Where (geographically) the event occurred
  • Properties of the device on which the event occurred

6. Conclusions

We hope this has been an interesting exploration of how we can potentially adapt and simplify the grammar of human languages to express a new grammar for digital events. We are really excited about the possibilities this opens up - initially around expressing such a grammar in our new Avro event model, and later hopefully in graph databases such as Neo4J.

Of course, we have only just started to sketch out this new event model, and we hope that it will prompt a wider debate with the Snowplow and analytics communities. We are excited to evolve these ideas and build a model for universal event analytics with you, together - and we look forward to continuing the conversation on our snowplow-user mailing list.

And finally, many thanks again to Loic Dias Da Silva for sharing his original Actor-Action-Object idea on our blog!

Thoughts or questions? Come join us in our Discourse forum!

Alex Dean

Alex is co-founder and technical lead at Snowplow. You can find in him on , Twitter and LinkedIn.