If you’re upgrading from Snowplow pre-R119 and S3 Loader pre-0.7.0 you have to upgrade to 0.7.0 or 1.0.0 first in order to split bad data produced during transition period.
In Snowplow R119 we introduced a new self-describing bad rows format. S3 Loader 0.7.0 was the first version capable of partitioning self-describing data based on its schema. 0.7.0 and 1.0.0 are capable to recognize at runtime whether old or new format is consumed and use
partitionedBucket output path only if necessary, so both formats can be consumed.
S3 Loader 2.0.0 supports only new self-describing format and will be raising exceptions if legacy bad data is pushed.
In 2.0.0 the S3 Loader went through a major configuration refactoring. A sample config is available in GitHub repository.
- No more
awsproperty allowing to hardcode credentials – default credentials chain is used
- NSQ support has been dropped
- Instead of
s3the topology now is represented as
input(Kinesis Stream) and
output(S3 bucket and a Kinesis Stream for bad data)
partitionedBucketproperty has been removed (see Caution above)
purposeproperty allowing Loader to recognize the data it works with:
ENRICHEDfor enriched TSVs enabling latency monitoring,
SELF_DESCRIBINGgenerally for any self-describing JSON but usually used for bad rows and
metrics.sentry.dsncan be used to track exceptions, including internal KCL exceptions
metricsd.statsdcan be used to send observability data to StatsD-compatible server