Pipeline Components and Applications

  1. Home
  2. Docs
  3. Pipeline Components and Applications
  4. Loaders and storage targets
  5. Snowplow RDB Loader
  6. DynamoDB Table

DynamoDB Table

In order to perform deduplication in the RDB shredder, a dynamoDB table is needed to store the event manifest.

1. Creating Amazon DynamoDB table

If Relational Database Shredder doesn’t find a specified table – it will try to create it with default provisioned throughput, which might be not sufficient. This step is optional, but recommended.

The table name can be anything, but must be unique.

Partition key must be called eventId and have type String. Sort key must be called fingerprint and have type String.

Uncheck “Use default settings” checkbox and set “Write capacity units” to 100. Capacity units value is individual and should tweaked depending on your cluster size.

Create Table

After table is created, write down “Amazon Resource Name (ARN)” in “Overview” tab. It should look similar to arn:aws:dynamodb:us-east-1:719197435995:table/one-more-deduplication-test This ARN will be used in next step.

Finding the Table ARN

2. Setting up IAM Policy

Log into the AWS console, navigate to the IAM section and go to Policies:

Select Create Your Own Policy and choose descriptive name. Paste following JSON as Policy Document:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1486765706000", "Effect": "Allow", "Action": [ "dynamodb:CreateTable", "dynamodb:DeleteTable", "dynamodb:DescribeTable", "dynamodb:PutItem" ], "Resource": [ "arn:aws:dynamodb:us-east-1:719197435995:table/snowplow-deduplication" ] } ] }
Code language: JSON / JSON with Comments (json)

Notice element in Resourse array. It must be changed to your ARN from previous step.

dynamodb:CreateTable and dynamodb:DeleteTable are unnecessary if you already created table.

Attach the IAM policy