Getting started on Snowplow Open Source

  1. Home
  2. Docs
  3. Getting started on Snowplow Open Source
  4. Setup Snowplow Open Source on AWS
  5. Setup Validation and Enrich
  6. Add additional enrichments
  7. Referer parser Enrichment

Referer parser Enrichment

Compatibility

JSON Schema iglu:com.snowplowanalytics.snowplow/referer_parser/jsonschema/1-0-0 Compatibility 0.9.6+ Data provider Snowplow referer-parser

Overview

The referer parser enrichment uses the Snowplow referer-parser to extract attribution data from referer URLs. You can provide a list of internal subdomains which will be treated as “internal” rather than unknown. Each subdomain will have to be listed individually. It is currently not possible to use wildcards or Regex. Note that we always use the original HTTP misspelling of ‘referer’ (and thus ‘referal’) in this project – never ‘referrer’.

Example

{
	"schema": "iglu:com.snowplowanalytics.snowplow/referer_parser/jsonschema/1-0-0",

	"data": {

		"name": "referer_parser",
		"vendor": "com.snowplowanalytics.snowplow",
		"enabled": true,
		"parameters": {
			"internalDomains": [
				"subdomain1.mysite.com",
				"subdomain2.mysite.com"
			]
		}
	}
}

In this example, if an event’s referer URL is either “subdomain1.mysite.com” or “subdomain2.mysite.com” it will be counted as internal.

Data sources

The input value for the enrichment comes from refr parameter which is mapped to page_referrer field in atomic.events table. Also the value representing the domain name of the server the current page served from is taken from url parameter. This value ends up in the page_urlhost field of atomic.events table. It is extracted during common enrichment process by means of ConversionUtils.explodeUri function.

Algorithm

The referer parser enrichment uses the Snowplow referer-parser library to extract attribution data from referer URLs. There are 4 mediums that we support

  • unknown for when we know the source, but not the medium
  • email for webmail providers
  • social for social media services
  • search for search engines

The referer-parser identifies whether a URL is a known referer or not by checking it against the referers.yml file. Additionally the current page domain name (value for page_urlhost) is used to determine if this is an internal referer by comparing it with the domain names extracted from internalDomains parameter of referer_parser.json.

Data generated

Below is the summary of the fields in atomic.events table driven by the result of this enrichment (no dedicated table).

FieldPurpose
refr_mediumType of referer (ex. ‘search’, ‘internal’)
refr_sourceName of referer if recognised
refr_termKeywords if source is a search engine