Snowplow pipelines segregate events that are problematic in order to keep data quality downstream high. For more information on understanding failed events see here.
For Snowplow customers that would like to benefit from seeing aggregates of these failed events by type, there is an optional feature in the Insights console.
This interface is intended to give our customers a quick representation of the volume of events that are coming into the pipeline, but are failing for different reasons.
At the top (currently for AWS customers only) there is a data quality score that compares the volume of failed events to the volume that were successfully loaded into a data warehouse.
In the table failed events are aggregated by the unique type of failure (e.g. validation, adapter) and the specific error message (e.g. schema not found, MAX length exceeded).
By selecting a particular error you are able to get more detail:
The detailed view shows the error message as well as other useful meta data (when available) like app_id as an example to help quickly diagnose the source and root cause of the error.
Additional infrastructure cost
To populate this screen, there is an additional micro-service running on your infrastructure to aggregate failures as they occur in your pipeline.
This is currently estimated to start at $160/month for AWS; $125/month for GCP. Costs may vary due to volume of failed events and spikes.
Therefore this is an optional addition for the console.