1. Home
  2. Docs
  3. Managing data quality
  4. Accessing failed events

Accessing failed events

Note

This documentation is for pipeline versions R118+. If you are unsure of which version your pipeline is running, please contact support.

Failed events are preserved by writing them to cloud storage:

  • in S3 if you pipeline runs in AWS
  • in Google Cloud Storage (GCS) if your pipeline runs on GCP

If your pipeline uses elasticsearch, the failed are also loaded into your “bad” index. This will be the case if you have enabled elasticsearch in your production pipeline, or if you use Snowplow Mini.

Even if your pipeline does not have elasticsearch, you can still write SQL queries to investigate your failed events in cloud storage by using Athena or BigQuery external tables.

Failed events are stored in the same format whether in AWS S3 and in GCS. Each file contains newline-delimited JSON objects. The files are partitioned into directories, first by failed event type, and then by date.

AWS example
GCS example

For Elasticsearch rows are loaded into the ‘bad’ index. Below is an example of a schema violation error, with specific focus on the data.failure.messages object.

Elasticsearch example

Articles