Unlike the Spark transformer, the stream transformer reads data directly from the enriched Kinesis stream and does not use Spark or EMR. It’s a plain JVM application, like Stream Enrich or S3 Loader.
Reading directly from Kinesis means that the transformer can bypass the
s3DistCp staging / archiving step.
Another benefit is that it doesn’t process a bounded data set and can emit transformed folders based only on its configured frequency. This means the pipeline loading frequency is limited only by the storage target.
Downloading the artefact
The asset is published as a jar file attached to the Github release notes for each version.
It’s also available as a Docker image on Docker Hub under
The transformer takes two configuration files:
config.hoconfile with application settings
iglu_resolver.jsonfile with the resolver configuration for your Iglu schema registry.
See here for details on how to prepare the Iglu resolver file.
NOTE: All self-describing schemas for events processed by the transformer must be hosted on Iglu Server 0.6.0 or above. Iglu Central is a registry containing Snowplow-authored schemas. If you want to use them alongside your own, you will need to add it to your resolver file. Keep it mind that it could override your own private schemas if you give it higher priority. For details on this see here.
Running the stream transformer
The two config files need to be passed in as base64-encoded strings:
Code language: PHP (php)
$ docker run snowplow/transformer-kinesis:4.0.2 \ --iglu-config $RESOLVER_BASE64 \ --config $CONFIG_BASE64