Pipeline Components and Applications

  1. Home
  2. Docs
  3. Pipeline Components and Applications
  4. Enrichment

Enrichment

This is the technical documentation for Enrichment. If you are looking to configure an enrichment, check the guides for managing enrichments as part of your pipeline.

In a Snowplow pipeline, Enrichment is the step that reads raw collector payloads from a stream populated by the collector, and validates that each event conforms to the expected schema. It then enriches or widens the event (e.g. by inferring geolocation from the user’s IP address) and then writes the event out to another stream.

There are currently three Enrichment processes available for setup:

Stream Enrich (for AWS)
An application that reads Thrift events from a Kinesis stream and outputs back to a Kinesis stream

Enrich PubSub (for GCP)
A standalone JVM application that reads Thrift events from a PubSub topic and outputs back to PubSub.

Beam Enrich (for GCP)
Similar to Enrich PubSub, this application reads and writes events from PubSub topics. It is implemented using Apache Beam, and is deployed as a Dataflow job.

Articles