Getting started on Snowplow Open Source

  1. Home
  2. Docs
  3. Getting started on Snowplow Open Source
  4. Setup Snowplow Open Source on AWS
  5. Setup the Snowplow collector
  6. Install the Scala Stream Collector

Install the Scala Stream Collector

The Scala Stream Collector allows near-real-time processing of a Snowplow event stream. Events are received by the collector and delivered to either Amazon Kinesis, Google PubSub, Apache Kafka, NSQ, Amazon SQS or to stdout for a custom stream collection process. AWS users should configure the Scala Stream collector to output to a Kinesis stream (which we call the “raw” stream). They should then setup the enricher to consume the raw events from this stream and write them out to a second, “enriched” stream.

On AWS, there is also the option to configure SQS as the sink for raw events. However, there is currently no direct route for these events to the enrich process. User can write their own tool or they can use the sqs2kinesis application to move the data from SQS to Kinesis and then continue to set up Enrichment as described below.

Dependencies

Version 11 of the Java Runtime Environment

or

Docker

Installing the docker container

The Scala Stream Collector is published as a Docker image, please see our Hosted assets page for details.

Example pull command:

docker pull snowplow/scala-stream-collector-kinesis

Installing the jarfile

You can choose to either:

  1. Download the Scala Stream collector jarfile, or:
  2. Compile it from source

Download the jarfile

To get a local copy, you can download the jarfile directly from our hosted assets bucket on Amazon S3 – please see our Hosted assets page for details.

Compile from source

Alternatively, you can build it from the source files. To do so, you will need scala and sbt installed.

To do so, clone the Snowplow Stream Collector repo:

$ git clone https://github.com/snowplow/stream-collector.git
Code language: PHP (php)

Use sbt to resolve dependencies, compile the source, and build an assembled fat JAR file with all dependencies.

$ sbt "project *targeted platform*" assembly
Code language: JavaScript (javascript)

where targeted platform can be:

  • kinesis
  • google-pubsub
  • kafka
  • nsq
  • stdout
  • sqs

The jar file will be saved as snowplow-scala-collector-[targeted platform]-[version].jar in the [targeted platform]/target/scala-2.12 subdirectory – it is now ready to be deployed.