The Scala Stream Collector allows near-real-time processing of a Snowplow event stream. Events are received by the collector and delivered to either Amazon Kinesis, Google PubSub, Apache Kafka, NSQ, Amazon SQS or to
stdout for a custom stream collection process. AWS users should configure the Scala Stream collector to output to a Kinesis stream (which we call the “raw” stream). They should then setup the enricher to consume the raw events from this stream and write them out to a second, “enriched” stream.
On AWS, there is also the option to configure SQS as the sink for raw events. However, there is currently no direct route for these events to the enrich process. User can write their own tool or they can use the
sqs2kinesis application to move the data from SQS to Kinesis and then continue to set up Enrichment as described below.
Version 11 of the Java Runtime Environment
Installing the docker container
The Scala Stream Collector is published as a Docker image, please see our Hosted assets page for details.
Example pull command:
docker pull snowplow/scala-stream-collector-kinesis
Installing the jarfile
You can choose to either:
- Download the Scala Stream collector jarfile, or:
- Compile it from source
Download the jarfile
To get a local copy, you can download the jarfile directly from our hosted assets bucket on Amazon S3 – please see our Hosted assets page for details.
Compile from source
To do so, clone the Snowplow Stream Collector repo:
Code language: PHP (php)
$ git clone https://github.com/snowplow/stream-collector.git
sbt to resolve dependencies, compile the source, and build an assembled fat JAR
file with all dependencies.
$ sbt "project *targeted platform*" assembly
targeted platform can be:
jar file will be saved as
snowplow-scala-collector-[targeted platform]-[version].jar in the
[targeted platform]/target/scala-2.12 subdirectory – it is now ready to be deployed.