The Scala Stream Collector allows near-real-time processing of a Snowplow event stream. Events are received by the collector and delivered to either Amazon Kinesis, Google PubSub, Apache Kafka, NSQ or to
stdout for a custom stream collection process. AWS users should configure the Scala Stream collector to output to a Kinesis stream (which we call the “raw” stream). They should then setup the enricher to consume the raw events from this stream and write them out to a second, “enriched” stream.
Version 8 (aka 1.8) of the Java Runtime Environment
2a.1 Installing the docker container
The Scala Stream Collector is published as a Docker image, please see our Hosted assets page for details.
Example pull command:
docker pull snowplow/scala-stream-collector-kinesis
2b.1 Installing the jarfile
You can choose to either:
- Download the Scala Stream collector jarfile, or:
- Compile it from source
2b.2 Download the jarfile
To get a local copy, you can download the jarfile directly from our hosted assets bucket on Amazon S3 – please see our Hosted assets page for details.
2b.3 Compile from source
To do so, clone the Snowplow Stream Collector repo:
$ git clone https://github.com/snowplow/stream-collector.git
sbt to resolve dependencies, compile the source, and build an assembled fat JAR
file with all dependencies.
$ sbt "project *targeted platform*" assembly
targeted platform can be:
jar file will be saved as
snowplow-scala-collector-[targeted platform]-[version].jar in the
[targeted platform]/target/scala-2.12 subdirectory – it is now ready to be deployed.