Stream transformer

info

For a high-level overview of the Transform process, see Transforming enriched data. For guidance on picking the right transformer app, see How to pick a transformer.

Unlike the Spark transformer, the stream transformer reads data directly from the enriched stream and does not use Spark or EMR. It's a plain JVM application, like Stream Enrich or S3 Loader.

Reading directly from stream means that the transformer can bypass the s3DistCp staging / archiving step.

Another benefit is that it doesn't process a bounded data set and can emit transformed folders based only on its configured frequency. This means the pipeline loading frequency is limited only by the storage target.

Stream Transformer has three variants: Transformer Kinesis, Transformer Pubsub and Transformer Kafka. They are different variants for AWS, GCP and Azure.