1. Home
  2. Docs
  3. Try Snowplow
  4. Tutorials
  5. Tutorial: Web analytics

Tutorial: Web analytics

Introduction

Snowplow empowers you to collect granular data from your website(s). To gain a better understanding of how users are engaging, let’s start off by aggregating this data into sessions. While sessions don’t tell you everything, and don’t necessarily reflect the entire customer journey, they are great for some initial exploratory analysis such as:

  • How many sessions does each of your marketing channels generate?
  • What is the average time users spend engaging with your site in a given session? How does that compare to the average session length?
  • How many pages do users look at in a given session?
  • Etc.

What you’ll be doing

You have already set up Snowplow’s out of the box web tracking by instrumenting the Javascript Tracker in your application. This includes tracking page_view and page_ping events.

The next step is to aggregate these events into sessions. For that purpose, you’ll be reviewing the default Snowplow session logic, and then running a simple SQL query to model the data.

Updating the sessionization logic (optional)

The Snowplow JavaScript tracker automatically tracks a session identifier and a session index with all web events. Sessions are reset after 30 minutes of inactivity by default, but this can be changed this in the tracker initialization by adding the sessionCookieTimeout (in seconds):

window.snowplow("newTracker", "sp", ..., { appId: "try-snowplow-tracking", platform: "web", sessionCookieTimeout: 3600, contexts: { webPage: true, performanceTiming: true } });
Code language: JavaScript (javascript)

Furthermore, you can manually reset a session, for example after a conversion, like so:

window.snowplow("newSession");
Code language: JavaScript (javascript)

Go ahead and update the sessionization logic in your tracker implementation if you would like to. More information on the Snowplow session cookie can be found here.

Modeling sessions

What does the model do?

For this recipe you’ll create a simple session table describing web engagement by running the following query in your query tool of choice. This is a very simplified version of the sessions table produced by our standard web data model. For each session, it will capture the session ID, session start and end times, marketing channel as well as engagement information: page views, link clicks and time engaged (in seconds).

First generate the table:

CREATE TABLE derived.sessions AS( WITH sessions AS ( SELECT ev.domain_sessionid AS session_id, MIN(ev.derived_tstamp) AS session_start, MAX(ev.derived_tstamp) AS session_end, SUM(CASE WHEN ev.event_name = 'page_view' THEN 1 ELSE 0 END) AS page_views, SUM(CASE WHEN ev.event_name = 'link_click' THEN 1 ELSE 0 END) AS link_clicks, 10*SUM(CASE WHEN ev.event_name = 'page_ping' THEN 1 ELSE 0 END) AS time_engaged_in_s FROM atomic.events AS ev GROUP BY 1 ) SELECT -- session information s.session_id, s.session_start, s.session_end, -- marketing channel CASE WHEN ev.refr_medium IS NULL AND ev.page_url NOT ILIKE '%utm_%' THEN 'Direct' WHEN (ev.refr_medium = 'search' AND ev.mkt_medium IS NULL) OR (ev.refr_medium = 'search' AND ev.mkt_medium = 'organic') THEN 'Organic Search' WHEN ev.refr_medium = 'search' AND ev.mkt_medium SIMILAR TO '%(cpc|ppc|paidsearch)%' THEN 'Paid Search' WHEN ev.refr_medium = 'social' OR ev.mkt_medium SIMILAR TO '%(social|social-network|social-media|sm|social network|social media)%' THEN 'Social' WHEN ev.refr_medium = 'email' OR ev.mkt_medium ILIKE 'email' THEN 'Email' WHEN ev.mkt_medium SIMILAR TO '%(display|cpm|banner)%' THEN 'Display' ELSE 'Other' END AS marketing_channel, -- activity s.page_views, s.link_clicks, s.time_engaged_in_s FROM atomic.events AS ev INNER JOIN sessions AS s ON ev.domain_sessionid = s.session_id AND ev.derived_tstamp = s.session_start GROUP BY 1,2,3,4,5,6,7 );
Code language: SQL (Structured Query Language) (sql)

And then view it:

SELECT * FROM derived.sessions;
Code language: SQL (Structured Query Language) (sql)

Other queries you might want to run:

Sessions by marketing channel:

SELECT marketing_channel, COUNT(DISTINCT session_id) AS sessions FROM derived.sessions GROUP BY 1 ORDER BY 2 DESC;
Code language: SQL (Structured Query Language) (sql)

Average number of page views and time engaged in seconds per session:

SELECT AVG(page_views) AS avg_page_views, AVG(time_engaged_in_s) AS avg_time_engaged_in_s FROM derived.sessions;
Code language: SQL (Structured Query Language) (sql)

Let’s break down what you’ve done

  • You have captured a session identifier with all web events, and customized the sessionization logic to match your requirements.
  • You have run a simple SQL query to model the Snowplow data collected from your website into sessions. Based on the sessions table, you can easily see how users are engaging with your site.

Further reading

For more information on how Snowplow models web data, take a look at our standard web data model.

A general view of Snowplow for web can also be found on our website.