On the top of the page we have a graph showing the number of bad rows per run. Underneathe we have a selection of data, and the ability to drill in and explore any of them in more detail:
Pick your time horizion
The first thing to do is choose the time horizon you want to look at your bad rows over. By default, Kibana will load with a time horizon of only 15 minutes. It can be useful to look at bad rows over a much longer horizon. By clicking on the top right corner of the screen we can select the time period to view:
We then select to look at 30 days worth of data:
Filtering out bad rows that we need not worry about
One of the interesting things that jumps out when you monitor the bad rows is that there is a fair amount of data that is generated not by Snowplow trackers or webhooks that fail to process (which mean lost data) but data that is generated from malicious bots pinging the internet looking for security vulnerabilities. The below is an example:
That was a request to our trial-collector.snplow.com/admin. The next request was generated by a bot trying to ping trial-collector.snplow.com/freepbx/admin/changes:
As these bad rows do not represent data that we want but failed to process, we can safely ignore them. To do that, we simply filter these out by entering the following query in the Kibana search box at the top of the screen:
This removes all rows that represent requests to paths that the collector does not support.
Another bad row that you need not worry about are rows like the following:
The remaining rows should all be geniune bad data i.e. data generated by the trackers or webhooks, so we need to drill into what is left and to unpick the errors.
Diagnosing underlying data collection problems
Now that we’ve filtered out bad rows that we need not worry about, we can identify real issues with our tracking.
I recommend working through the following process:
1. Identify the first error listed
Inspecting the first error, we might find something like the following:
The above error message is caused by a failure to validate data against the associated schema. Specifically, a two fields have been included in a JSON sent into Snowplow that are not allowed:
2. Identify how many bad rows are caused by this error
Now we’ve identified a tracker error, we want to understand how prevalent this is. We can do that by simply updating our Kibana query to return rows with this type of error message i.e.
In our case, we can see that this error was only introduced yesterday, but that since then it accounts for almost 2500 bad rows:
Addressing this issue is essential. Fortunately, this should be straightforward: most likely we need to create a new version of the schema that allows for the two fields, and update the tracker code to send in a reference to the new schema version in the different self-describing JSONs.
3. Filter out those bad rows and repeat
Now we’ve dealt with the first source of bad rows, let’s identify the second. This is easy, we update our Kibana query to filter out the bad rows we were exploring above:
and in addition filter out the bad rows that we did not need to worry about:
We repeat the above process until all the sources of bad data have been identified and dealt with!