Failed events are preserved by writing them to cloud storage:
- in S3 if you pipeline runs in AWS
- in Google Cloud Storage (GCS) if your pipeline runs on GCP
If your pipeline uses elasticsearch, the failed are also loaded into your “bad” index. This will be the case if you have enabled elasticsearch in your production pipeline, or if you use Snowplow Mini.
Even if your pipeline does not have elasticsearch, you can still write SQL queries to investigate your failed events in cloud storage by using Athena or BigQuery external tables.
Failed events are stored in the same format whether in AWS S3 and in GCS. Each file contains newline-delimited JSON objects. The files are partitioned into directories, first by failed event type, and then by date.
For Elasticsearch rows are loaded into the ‘bad’ index. Below is an example of a schema violation error, with specific focus on the