Snowplow is an Open-Core Behavioral Data Platform that helps businesses of all sizes collect, govern and model behavioral data. This product is built on the world’s largest open source project for collecting behavioral data.
This quick start guide will get you up and running with a Snowplow open source pipeline – and deliver rich behavioural data to stream, lake and warehouse in less than an hour.
The data pipeline that you will have set up by the end of this guide will look similar to the following (this will vary by cloud and can be designed to suit your needs due to our modular approach):
We have built a set of terraform modules, which automates the setting up & deployment of the required infrastructure & applications for an operational Snowplow open source pipeline, with just a handful of input variables required on your side.
By the end of this guide, you will be able to:
- Collect granular, well-structured data with our suite of web, mobile and server side SDKs
- Create your own custom events and entities
- Easily enable and disable our suite of out-of-the-box enrichments
- Consume your rich data in real time from a choice of over 5 supported destinations (kinesis, pubsub, S3, Postgres or ElasticSearch)
How does the quick start edition compare?
Out-of-the-box, the quick start edition will:
- Handle up to ~100 events per second (~9 million events per day)
- Cost ~$200 (depending on data transfer costs) per month for ~100 events per second in AWS infrastructure costs, and ~$240 on GCP
It will get you up and running as quickly as possible with Snowplow’s open source product so that you can start exploring how to run & manage a Snowplow pipeline that will help you to deliver value with rich, high quality behavioural data to power multiple use cases.
|Try Snowplow||Easily deployed by Data Analysts with no experience in DevOps, who want to learn about the Snowplow data.||* No custom enrichments|
* No custom events & entities
* No first party server cookies
* No real time stream; real time POC not possible
|Open Source Quick Start||Easily deployed by Data Engineers who want to learn about the Snowplow pipeline and data.||* Handles ~100 events per second out-of-the-box (can be configured to handle a higher throughput)|
* No support for the following destinations:
** BigQuery (coming soon!)
|Open Source||The building blocks that give you the freedom to run and manage the leading open source data pipeline.||* No implementation or ongoing guidance & strategic support |
* No managed upgrades
* 24×7 technical support not provided
* No SLAs
* No data quality monitoring dashboard, or failed event alerting
* No UI for pipeline monitoring or editing your pipeline configurations
* No UI for adding & evolving your data structures
* No data modelling UI to manage how data is prepared and monitor lineage
* No access to cloud outage protection, or cross cloud data delivery
|Insights||The best Behavioural Data Platform on the market. Reliable, secure, backed by SLAs plus unrivalled expertise that comes from running data pipelines for 100s of customers.||None|
See side by side feature comparison here.