Pipeline Components and Applications

  1. Home
  2. Docs
  3. Pipeline Components and Applications
  4. Loaders & storage targets
  5. Snowplow Snowflake Loader
  6. Backpopulate the manifest

Backpopulate the manifest

In order to pre-populate manifest with run ids that have to be never loaded you can use backfill.py script.

Script requires to have Python 3, Snowplow Python Analytics SDK 0.2.3+ and boto3:

$ pip install boto3 snowplow_analytics_sdk $ wget https://raw.githubusercontent.com/snowplow-incubator/snowplow-snowflake-loader/release/0.4.0/backfill.py # Won't actually be downloaded as repository is private
Code language: PHP (php)

Script accepts 6 required arguments. Notice startdate, this is the date since which (inclusive) transformer should process run ids:

$ ./backfill.py \ --startdate 2017-08-22-01-01-01 \ --region $AWS_REGION \ --manifest-table-name $DYNAMODB_MANIFEST \ --enriched-archive $TRANSFORMER_INPUT \ --aws-access-key-id=$AWS_ACCESS_KEY_ID \ --aws-secret-access-key=$AWS_SECRET_KEY
Code language: PHP (php)