Looking closer at the Request URL, you’ll notice data points like
v=1 noting this is the first visit, and
pageview&_s=1 noting the first pageview of the visit.
These limitations do present challenges, but ones that our users are accustomed to overcoming. Snowplow users are familiar with event-level data and implementing data modeling processes that stitch user identities across different devices, platforms, and channels, divide each user’s stream into segments and units of work, and apply their own custom attribution models. Because of this, the mirroring the hit-level data was the obvious choice and is the method we recommend for anyone who wants to warehouse their Google Analytics data and prizes control, granularity, and flexibility over simplicity.
The hit-level data collected through Google Analytics is sent to their servers where it is sampled and returned to your dashboards, ready to be analyzed. When you move data into your warehouse through the API, you’re storing the sampled, pre-formatted data that’s returned to you and that you can view through the user interface. Hit-level data, on the other hand, is the Google terminology for event-level data, a key ingredient in the type of sophisticated analytics we advocate for at Snowplow. The raw hit-level data, therefore, can answer much richer questions about your users and the activities they’re engaging in on your website like blogs read before submitting a lead form or time spent on each page in a given visit, all of which can be viewed at a user-by-user level.
Let’s look at examples of what can be done with these two different data types. If you’re accessing Google Analytics data through the API, Google does not provide an out of the box option for using cookie ID as a dimension or a filter, making it impossible to drill down to an individual or user level. In the absence of this, the GA API allows for pulling aggregate data that can be very useful in monitoring your web traffic.
Using Google Developer Tools Query Explorer, I attempted to run a query using as many metrics and dimensions as possible, looking to return the richest data possible. My first selection returned an error that these dimensions and metrics cannot be queried together- while Google Analytics has very detailed data, it needs to be combined in certain combinations. By modifying the dimensions and metrics in my query, I was able to generate results:
This returned the following table:
Because the Google Analytics API only lets you fetch aggregate data, while you can find out the number of visitors in a particular segment on a particular day on your website, fetch a variety of metrics for those users (such as source/medium, browser, or session duration), you can’t get at the underlying, user-level data to follow an individual user’s journey. With the limits on API requests and inability to pull data in real-time, you need to be very strategic with your queries which could present a challenge for organizations with many users with their hands in working with the data. The result is that using the Query Explorer to recreate an individual user’s journey can be difficult and, requiring multiple queries which then must be connected, can be resource intensive.
Using this example, Yali helped me pull an individual user’s journey from our Google Analytics data using Snowplow. To do that, we entered the following SQL query into Redash:
SELECT collector_tstamp, page_urlhost, page_urlpath, geo_country, geo_city FROM atomic.events e JOIN atomic.com_google_analytics_measurement_protocol_user_1 u ON u.root_id = event_id and u.root_tstamp = collector_tstamp WHERE collector_tstamp > '2017-02-01' AND u.root_tstamp > '2017-02-01' AND u.client_id = '1639187844.1490602369' ORDER BY 1;
This query collects the page URL, URL path, and geo-location for one given visitor to the website. The generated table gives you a breakdown of each page visited by that user within the given time frame.
Clicking through the eight pages of results returned by the query will let you retrace the user’s steps as they navigated your website. Adding in additional metrics or dimensions is as straightforward as modifying the SQL query, though that’s easier for some than others. However, even with the extra energy required to have data agency using Snowplow with Google Analytics, the resulting data sets have the potential to be exponentially more specific to your use case and therefore more meaningful in many situations.
If you’re a Google Analytics user hitting the limits of what you can currently do with your data because of challenges posed by aggregate data or you want to join your Google Analytics data with other data sets in a warehouse for analysis, contact us to learn more! We’re always excited to have a chat.