Pipeline Components and Applications

  1. Home
  2. Docs
  3. Pipeline Components and Applications
  4. Enrichment

Enrichment

This is the technical documentation for Enrichment. If you are looking to configure an enrichment, check the guides for managing enrichments as part of your pipeline.

The Snowplow Enrichment step takes the raw log files generated by the Snowplow collectors, tidies the data up and enriches it so that it is:

  1. Ready to be analysed using EMR
  2. Ready to be uploaded into Amazon Redshift, PostgreSQL or some other alternative storage mechanism for analysis

There are currently three Enrichment processes available for setup:

EmrEtlRunner
An application that parses logs from a Collector and stores enriched events to S3

Stream Enrich (for AWS)
A Scala application that reads Thrift events from a Kinesis stream and outputs back to a Kinesis stream

Beam Enrich (for GCP)
An Apache Beam application that reads Thrift events from a PubSub topic and outputs back to a PubSub topic

Articles