Snowplow is an Open-Core Behavioral Data Platform that helps businesses of all sizes collect, govern and model behavioral data. This product is built on the world’s largest open source project for collecting behavioral data.
This quick start guide will get you up and running with a Snowplow open source pipeline – and deliver rich behavioural data to stream, lake and warehouse in less than an hour.
Note: this is only available on AWS currently, GCP is coming soon!
The data pipeline that you will have set up by the end of this guide will look as follows:
We have built a set of terraform modules, which automates the setting up & deployment of the required infrastructure & applications for an operational Snowplow open source pipeline, with just a handful of input variables required on your side.
By the end of this guide, you will be able to:
- Collect granular, well-structured data with our suite of web, mobile and server side SDKs
- Create your own custom events and entities
- Easily enable and disable our suite of out-of-the-box enrichments
- Consume your rich data in real time from Kinesis
- Query your data on S3 and in Postgres
How does the quick start edition compare?
The quick start edition is intended to be used for proof of concepts, or pre-production use cases. It will:
- Handles up to ~100 events per second (~9 million events per day)
- Cost roughly $130 per month for ~50 events per second in AWS infrastructure costs
It will give you a really good grasp of how the Snowplow pipeline has been architected for scale and reliability, but does not support this out-of-the-box. It’s purpose is to get you up and running as quickly as possible with Snowplow’s open source product so that you can start exploring how to run & manage a Snowplow pipeline that will help you to deliver value with rich, high quality behavioural data to power multiple use cases.
|Try Snowplow||Easily deployed by Data Analysts with no experience in DevOps, who want to learn about the Snowplow data.||* No custom enrichments|
* No custom events & entities
* No first party server cookies
* No real time stream; real time POC not possible
|Open Source Quick Start||Easily deployed by Data Engineers who want to learn about the Snowplow pipeline and data.||* Max ~100 events per second out-of-the-box|
|Open Source||The building blocks that give you the freedom to run and manage the leading open source data pipeline.||* No implementation or ongoing guidance & strategic support |
* No managed upgrades
* 24×7 technical support not provided
* No SLAs
* No data quality monitoring dashboard, or failed event alerting
* No UI for pipeline monitoring or editing your pipeline configurations
* No UI for adding & evolving your data structures
* No data modelling UI to manage how data is prepared and monitor lineage
* No access to cloud outage protection, or cross cloud data delivery
|Insights||The best Behavioural Data Platform on the market. Reliable, secure, backed by SLAs plus unrivalled expertise that comes from running data pipelines for 100s of customers.||None|
See side by side feature comparison here.