When failed events are generated on your pipeline the raw event payload along with details about the failure are saved into file storage (S3 on AWS, GCS on Google Cloud).
You can directly access and download examples of events that are failing from file storage, this is useful for further investigation and also required to design a recovery using the Recovery Builder.
Retrieving raw data from S3 on AWS
- Login to your AWS Console account and navigate to the sub-account that contains your Snowplow pipeline
- Navigate to your S3 storage buckets
- You should find a bucket with a name ending in
-kinesis-s3-badand within that a folder with your pipeline name e.g.
- Navigate into this folder and you should see
partitioned(search if it isn’t visible), and within this a folder for each type of failed event. Select the relevant type for the failed events you wish to find.
- You can now browse the folder using date and time to find a batch of failed events that occurred on that date / time period.
Retrieving raw data from GCS on GCP
- Login to your Google Cloud Platform account and navigate to the project that contains your Snowplow pipeline
- Navigate to your Google Cloud Storage buckets
- You should find a bucket named with a prefix of
- Navigating into this you should see
partitioned, and within this a folder for each type of failed event. Select the relevant type for the failed event you wish to find.
- You can now drill down by year, month, day, and hour to find a batch of failed events that occured on that date / time period.
Step 5 – once you find the raw files you can download them and view them in a text editor