The Scala Stream Collector allows near-real-time processing of a Snowplow event stream. Events are received by the collector and delivered to either Amazon Kinesis, Google PubSub, Apache Kafka, NSQ or to stdout
for a custom stream collection process. AWS users should configure the Scala Stream collector to output to a Kinesis stream (which we call the “raw” stream). They should then setup the enricher to consume the raw events from this stream and write them out to a second, “enriched” stream.
1. Dependencies
Version 8 (aka 1.8) of the Java Runtime Environment
or
Docker
2a.1 Installing the docker container
The Scala Stream Collector is published as a Docker image, please see our Hosted assets page for details.
Example pull command:
docker pull snowplow/scala-stream-collector-kinesis
2b.1 Installing the jarfile
You can choose to either:
- Download the Scala Stream collector jarfile, or:
- Compile it from source
2b.2 Download the jarfile
To get a local copy, you can download the jarfile directly from our hosted assets bucket on Amazon S3 – please see our Hosted assets page for details.
2b.3 Compile from source
Alternatively, you can build it from the source files. To do so, you will need scala and sbt installed.
To do so, clone the Snowplow Stream Collector repo:
$ git clone https://github.com/snowplow/stream-collector.git
Code language: PHP (php)
Use sbt
to resolve dependencies, compile the source, and build an assembled fat JAR
file with all dependencies.
$ sbt "project *targeted platform*" assembly
Code language: JavaScript (javascript)
where targeted platform
can be:
- kinesis
- google-pubsub
- kafka
- nsq
- stdout
The jar
file will be saved as snowplow-scala-collector-[targeted platform]-[version].jar
in the [targeted platform]/target/scala-2.12
subdirectory – it is now ready to be deployed.