On a AWS pipeline, the Snowplow Stream Collector receives events sent over HTTP(S), and writes them to a raw Kinesis stream. From there, the data is picked up and processed by the Snowplow validation and enrichment job.

On a AWS pipeline the basic steps are:

  1. In the AWS console, create two Kinesis streams to which the collector will write good payloads and bad events.
  2. [Optional] Set up a SQS buffer, to handle spikes in traffic
  3. Configure and run the collector, using the main collector documentation, which describes the core concepts of how the collector works, and the configuration options.
  4. [Optional] Configure and run Sqs2kinesis, which moves data from your SQS buffer back to the primary Kinesis queue.
  5. [Optional] Sink the raw data to S3 using the Snowplow S3 loader. This is recommended in production so keep a copy of the raw data before any processing.