Getting started on Snowplow Open Source

  1. Home
  2. Docs
  3. Getting started on Snowplow Open Source
  4. Setup Snowplow Open Source on AWS
  5. Setup the Snowplow collector
  6. Install the Scala Stream Collector

Install the Scala Stream Collector

The Scala Stream Collector allows near-real-time processing of a Snowplow event stream. Events are received by the collector and delivered to either Amazon Kinesis, Google PubSub, Apache Kafka, NSQ or to stdout for a custom stream collection process. AWS users should configure the Scala Stream collector to output to a Kinesis stream (which we call the “raw” stream). They should then setup the enricher to consume the raw events from this stream and write them out to a second, “enriched” stream.

1. Dependencies

Version 8 (aka 1.8) of the Java Runtime Environment

or

Docker

2a.1 Installing the docker container

The Scala Stream Collector is published as a Docker image, please see our Hosted assets page for details.

Example pull command:

docker pull snowplow/scala-stream-collector-kinesis

2b.1 Installing the jarfile

You can choose to either:

  1. Download the Scala Stream collector jarfile, or:
  2. Compile it from source

2b.2 Download the jarfile

To get a local copy, you can download the jarfile directly from our hosted assets bucket on Amazon S3 – please see our Hosted assets page for details.

2b.3 Compile from source

Alternatively, you can build it from the source files. To do so, you will need scala and sbt installed.

To do so, clone the Snowplow Stream Collector repo:

$ git clone https://github.com/snowplow/stream-collector.git

Use sbt to resolve dependencies, compile the source, and build an assembled fat JAR file with all dependencies.

$ sbt "project *targeted platform*" assembly

where targeted platform can be:

  • kinesis
  • google-pubsub
  • kafka
  • nsq
  • stdout

The jar file will be saved as snowplow-scala-collector-[targeted platform]-[version].jar in the [targeted platform]/target/scala-2.12 subdirectory – it is now ready to be deployed.

Next: Configure the Scala Stream Collector