Pipeline Components and Applications

RDB (Relational Database) Loader is a pair of applications that work in tandem to load Snowplow events into a Redshift cluster.

  1. The RDB Shredder is a Spark job that reads enriched events from S3, and shreds them into separate entities. It also performs event deduplication.
  2. The loader itself is a standalone application that executes the SQL statement that copies the shredded entities into Redshift.

Before setting up RDB Loader its recommended to setup and launch Redshift cluster first.