The unrivaled power of joining client- and server-side tracking: lessons from the Ruby tracker

Share

We are pleased to announce the recent release of the Snowplow Ruby tracker 0.8.0. We’ve added several new features and improvements, some of which will be discussed later in this post. One of the improvements is new documentation: all new Snowplow docsAPI docs, and an example repo using Ruby on Rails.

This blog post will focus on a broader topic than the Ruby tracker, namely how to use both client- and server-side tracking within one app. This structure can be seen in the demo app, which implements both the Snowplow Ruby and JavaScript trackers.

Why track server- and client-side?

Broadly, server-side tracking is more accurate, while client-side tracking can easily track dynamic events. Client-side events are richer out-of-the-box because they automatically track, for example, device type. But the server has access to other useful data. Therefore implementing both types gives the best of both worlds.

We are discussing mainly the JavaScript and Ruby trackers here, but the same principles apply to all client-side and server-side trackers.

Let’s start by considering website page views (page loads). Both the server and the client can track when a page loads. The JavaScript tracker has easy access to page data such as the full URL of the page itself and the referrer, including marketing UTM parameters, and the page’s title. This makes the client the logical place to track the page view. However, client-side events are susceptible to being blocked by adblockers, while server-side tracking is inherently unaffected. By tracking the same events in both client and server, it’s possible to estimate how much of your client-side tracking is being intercepted by adblockers. One of our new customers found that almost 20% of their website tracking had previously been lost.

For some event types, it’s clear where tracking should occur. The “add-to-basket” event in the Rails demo is generated when the user clicks on an “add to basket” button, and is processed as dynamic JavaScript. It’s convenient to track this activity client-side; the server doesn’t automatically know anything about the basket contents. However, if users adding things to their cart is an important part of your business use case, then you might want to consider tracking this server-side as well.

It’s easier to implement tracking if events are tracked in the most natural place for their type. For example, a user logging in makes sense as a server-side event, as the authentication is handled server-side, and the server already has the relevant data. However, the user submitting the login form is a client-side event. By tracking both events, you get the best picture of the user’s behavior and the effect – successful or failed login – that followed. Note that for simplicity, authentication is not incorporated into the Rails demo.

This extends to other database-related user behavior. Creating a blog post, for example, is usually a “CRUD” storage activity (create, read, update, or delete). A client-side event could be triggered when the user touches the “publish post” button. But the insertion of the new post into storage is most accurately tracked server-side, where it occurs.

Let’s follow a user’s journey through our eCommerce website. They have visited the home page (JS page view, Ruby page view), visited the shop and browsed the products (JS page views, Ruby page views, JS page pings for activity tracking), and then added some green skis to their basket (JS add to basket). Now they want to purchase the skis. Generally, events relating to revenue should be tracked server-side (Ruby purchase). This is partly because they are processed on the server, but it’s also important to use the most accurate tracking where revenue is involved.

What if the user didn’t like the skis once they arrived, and returned them? This might occur offline, and not be visible to the client at all. Tracking this activity as it’s processed by the server complements your transactional dataset, and helps you build the most accurate picture of your users and their behavior.

Getting the most from your data

Your code should be DRY, but not your event data. At Snowplow, we want your behavioral dataset to be as rich and valuable as possible, and be ready to use with minimal effort. By adding as much data as possible to each individual event, you can reduce the amount of time spent preparing and cleaning the data. This gives you, or your analysts, more time for generating insights.

Snowplow events are structured to help with this, with over 100 possible properties available to set directly: so-called “atomic” event properties. Client-side trackers automatically set many of these. On top of this, you can add event entities, which we will come back to shortly.

If you are capturing an event with both client-side and server-side tracking, how can you tell that the two events describe the same user activity?

For websites, the client-side trackers set two first-party cookies. The main cookie, _sp_id, stores user data including a unique user ID (domain_userid), a unique session ID (domain_sessionid), and the number of visits (sessions) the user has made to the site (domain_sessionidx). In JavaScript tracking, these atomic parameters are automatically added to each event. It’s possible to extract the cookie values server-side, so they can be set for server-side events. In the Ruby tracker demo app, we show how to do this simply for domain_userid, for page view events.

Extracting the cookie value, which is subsequently passed to the tracker:

# in ApplicationController

def snowplow_domain_userid
  sp_cookie = cookies.find { |key, _value| key =~ /^_sp_id/ }
  sp_cookie.last.split(".").first if sp_cookie.present?
end

This is set in the tracker code using the set_domain_user_id method:

@tracker.set_domain_user_id(domain_userid) unless domain_userid.nil?

Combined with other event properties such as the page_URL itself, and the event timestamps, it now becomes possible to determine that the two page views are related. Other client-side atomic user properties can also be passed to the server and set in server-side events, such as IP address or user agent. On mobile, although there are no cookies, you could similarly pass the client-side unique user identifiers to the server. These would be set using event context, rather than the atomic event properties.

Event context is one of the most important aspects of Snowplow tracking. Any number of strictly described JSON “entities” can be attached to events to provide additional data beyond the atomic properties. For example, the JavaScript trackers automatically attach a web page entity, whose sole parameter is an ID unique to that page load. This helps data modeling by allowing the easy identification of events that occurred during the same page view.

What about passing information in the other direction? Your client-side trackers can also request data from the server. For example, you could generate a unique server-side page load ID, and pass it to the client-side tracker. You can then attach that ID as an entity to all events from both trackers, making it easy to stitch them together.

On a website, the downside of using cookie-based identifiers such as domain_userid for user stitching is that cookies can be cleared or expire, and are sensitive to cookie rejection or adblocking. But the server-side trackers have access to other types of important data, for example stored user details. This could include a username, email address, or their geographic location – or even their favorite sports team. In this case, attaching all the relevant user data might be best kept server-side, by creating a “user” entity to attach to all server-side events.

However, as just described for page ID, a unique user ID could be shared with and set in the client-side events, as an entity. Take care not to expose any Personally Identifying Information if you are passing user data to the client, and remember that making extra requests for server-side data has the risk of slowing down your app.

Event context can be used for any data that’s important to you, not just user identifiers or page IDs. In the Rails demo app, we use the same “product” entity for both the client-side add-to-basket and server-side custom purchase events. This results in a consistent “product” table when viewing the processed events in the data warehouse, regardless of whether the event came from a client- or server- side tracker. All Snowplow trackers create events in the same format, so you don’t need different models for different event sources.

New Ruby tracker features to help you generate rich events

We’ve added several new features to the Ruby tracker to help you add data to your events. Read more about the changes on our Discourse forums.

The Ruby tracker can set various user parameters, called Subject properties, as “atomic” event properties. Many of these relate to client-side attributes, such as IP address, as mentioned earlier. From Ruby tracker version 0.7.0 onwards, you can set the cookie-derived domain_session_id and domain_session_idx atomic event properties in your Ruby server-side events. This works the same way as shown for domain_userid above.

To make it easier to track these Subject properties, you can now populate these atomic properties on an event-by-event basis. We’ve added the ability to attach a Subject object, which contains the relevant user parameters, to any track_x_event method call.

In this example, client-side JavaScript cookie values are extracted, saved into a Subject object, and propagated into the atomic fields of a page view event. Note that this example is not included in the demo, but assumes the same Singleton implementation for the Ruby tracker as used in the demo app.

# In a Rails Controller

def page_view_with_cookie_values
  user_event_subject = SnowplowTracker::Subject.new

  sp_cookie = cookies.find { |key, _value| key =~ /^_sp_id/ }
  if sp_cookie.present?
    sp_cookie_values = sp_cookie.last.split(".")

    user_event_subject.set_domain_user_id(sp_cookie_values[0])
    user_event_subject.set_domain_session_id(sp_cookie_values[5])
    user_event_subject.set_domain_session_idx(sp_cookie_values[2])
  end

  Snowplow.instance.tracker.track_page_view(page_url: request.original_url,
                                            subject: user_event_subject)
end

We’ve also created a new Page object, for website tracking. Pages store basic page data: the page URL, the referrer, and the page title. By attaching a Page object to any track_x_event method call, you can populate these page-related atomic properties for any event type. This means that with minimal coding effort, and no data modeling, you can know where in your website the event occurred.

# Adding Page data to a struct event

event_page = SnowplowTracker::Page.new(page_url: request.original_url)

Snowplow.instance.tracker.track_struct_event(category: 'forms',
                           action: 'start-input',
                           page: event_page)

Here’s a final, complex example that brings together all the ideas we’ve discussed. This time, let’s look at the generated events directly. A user has logged in to a website, generating a client-side JavaScript “form-submit” event, and a server-side Ruby “login-auth” event. They are both custom “self-describing” events. This event type is the most flexible, allowing you to determine exactly which parameters to track.

In this example, the JavaScript tracker has requested a unique page load ID and user ID from the server, which have been attached as entities. Using an event-specific Subject and Page, the Ruby tracker code has set various atomic properties: domain_useriddomain_sessioniddomain_sessionidxuseragent, and user_id. Again, the unique page load ID is attached as an entity, along with a user entity. Since the server can access user properties from the database, this user entity contains more details than in the JavaScript event.

Here is the example JavaScript event, as JSON, and here is the Ruby event. This JSON format is the output from a Snowplow Micro testing pipeline; full pipelines process events into database rows. Because the events are both in the same format, with many shared properties, it’s easy to stitch together and model the data. By tracking client-side and server-side, we can more fully understand the user’s behavior and activity.

Conclusion

By tracking different types of events in the most appropriate place, the tracking code has access to the relevant data already. This makes the tracking code easier to write. Use server-side tracking to be sure you’re receiving all the events, so that your dataset is accurate and complete.

Share properties between client-side and server-side trackers, using atomic event properties and event entities. Aim for every individual event to contain all the data you need to know who triggered it, when, and where it occurred. This will make your data easier to model and stitch together, saving time and money.

In summary, implementing both a client-side and server-side tracker in your app is a highly recommended, powerful method for producing the best possible behavioral dataset.

Share

Related articles