Referrer Parser Enrichment

Summary

The referer parser enrichment uses the Snowplow referer-parser to extract attribution data from referer URLs. You can provide a list of internal subdomains which will be treated as “internal” rather than unknown.

Overview

In order to help understand traffic patterns to your website, knowing which sites refer users is very much a staple of analytics. The referrer parser process takes the value of the referring URL and matches it against the company/site it belongs to.

This is particularly useful when looking for specific traffic from search engine providers or social networks as an example. Rather than scouring a full referrer URL list this enrichment adds an additional field so you can look at reports that combine sub-domains from some of the bigger referrers.

The results of the lookup from the referer parser end up in the atomic.events table in your data warehouse under the columns refr_medium (refering to categories like social or search for example), refr_source (companies like Google or Facebook) as well as others with the ‘refr’ prefix.

By specifying particular subdomains in the enrichment configuration file, traffic from those subdomains will be grouped into “Internal” rather than “Unknown”, which should be clearer when building reports.

Example

Snowplow has several subdomains like console.snowplowanalytics.com and discourse.snowplowanalytics.com. As users move from these subdomains to our main snowplowanalytics.com domain, we would like to capture that traffic as being referred from an “internal” medium. Therefore we would set the configuration as such:

{
    "schema": "iglu:com.snowplowanalytics.snowplow/referer_parser/jsonschema/1-0-0",
    "data": {
        "name": "referer_parser",
        "vendor": "com.snowplowanalytics.snowplow",
        "enabled": true,
        "parameters": {
            "internalDomains": [
                "console.snowplowanalytics.com",
                "discourse.snowplowanalytics.com"
            ]
        }
    }
}

Enabling this enrichment with the above configuration would fill the refr_medium column in our data warehouse with “Internal” when the referring URL to a page matches the subdomains above.

If we were then to run a query on the DISTINCT values next to a count of sessions for each we could have a table like the one below:

refr_medium Sessions
Search 272,699
Internal 142,555
Unknown 127,335
Social 14,525
Email 5,345