Configuring DataHub for Ingesting DynamoDB Tables from Specific AWS Accounts

Original Slack Thread

<@U06TM7M40RK> I want dynamodb databases only from specific accounts to be ingested. How to configure?

To configure DataHub to ingest DynamoDB databases only from specific AWS accounts, you can use the platform_instance configuration in your ingestion recipe. This configuration allows you to specify the AWS account ID, ensuring that only DynamoDB tables from the specified account are ingested.

Here is a basic example of how to set up your ingestion recipe:

source:
  type: dynamodb
  config:
    platform_instance: "AWS_ACCOUNT_ID"  # Replace with your AWS account ID
    aws_access_key_id: "${AWS_ACCESS_KEY_ID}"
    aws_secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
    aws_region: "${AWS_REGION}"  # Specify the AWS region if needed
sink:
  type: datahub-rest
  config:
    server: "http://datahub-datahub-gms:8080"

Steps to Configure:

  1. Specify the AWS Account ID: Use the platform_instance field to specify the AWS account ID from which you want to ingest DynamoDB tables.
  2. Provide AWS Credentials: Ensure that you provide the necessary AWS credentials (aws_access_key_id and aws_secret_access_key).
  3. Specify AWS Region: Optionally, you can specify the AWS region using the aws_region field to limit the ingestion to a specific region.

Example Recipe:

source:
  type: dynamodb
  config:
    platform_instance: "123456789012"  # Replace with your AWS account ID
    aws_access_key_id: "your_access_key_id"
    aws_secret_access_key: "your_secret_access_key"
    aws_region: "us-west-2"  # Replace with your desired AWS region
sink:
  type: datahub-rest
  config:
    server: "http://datahub-datahub-gms:8080"

Important Notes:

Would you like more detailed steps or additional information on configuring this setup?

Sources:

<@U06TM7M40RK> what if I have to only use one iam_role?

To configure DataHub to ingest DynamoDB databases using a specific IAM role, you can leverage the AWS IAM role assumption feature. This approach enhances security by avoiding the use of static AWS credentials and instead using IAM roles that can be assumed by the DataHub ingestion process.

Here are the steps to configure your ingestion recipe to use a specific IAM role:

  1. Create an IAM Policy: Define the necessary permissions for the metadata ingestion.
  2. Create an IAM Role: Attach the policy to an IAM role.
  3. Associate IAM Role with Kubernetes Service Account: Link the IAM role to a Kubernetes service account.
  4. Configure the Ingestion Recipe: Set up the ingestion recipe to use the IAM role.

Step-by-Step Guide

Step 1: Create an IAM Policy
Create an IAM policy with the required permissions. For example, for DynamoDB ingestion, you might need permissions like dynamodb:ListTables, dynamodb:DescribeTable, etc.

Example IAM Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:ListTables",
                "dynamodb:DescribeTable",
                "dynamodb:Scan"
            ],
            "Resource": "*"
        }
    ]
}

Step 2: Create an IAM Role
Create an IAM role and attach the policy created in Step 1. Ensure the role can be assumed by the service or instance running the DataHub ingestion.

Step 3: Associate IAM Role with Kubernetes Service Account
Follow the <https://docs.aws.amazon.com/eks/latest/userguide/create-service-account-iam-policy-and-role.html|AWS EKS guide> to associate a Kubernetes service account with the IAM role. Here is an example command using eksctl:

eksctl create iamserviceaccount \
    --name acryl-datahub-actions \
    --namespace datahub \
    --cluster datahub \
    --attach-policy-arn arn:aws:iam::<<account-id>>:policy/policy1 \
    --approve \
    --override-existing-serviceaccounts

Update the values.yaml file for the DataHub Helm chart to use the service account:

acryl-datahub-actions:
  enabled: true
  serviceAccount:
    name: acryl-datahub-actions

Step 4: Configure the Ingestion Recipe
In your ingestion recipe, you can omit the AWS access key and secret key. The DataHub ingestion process will use the default AWS credential provider chain, which includes the IAM role.

Example Ingestion Recipe for DynamoDB:

source:
  type: dynamodb
  config:
    platform_instance: "AWS_ACCOUNT_ID"  # Replace with your AWS account ID
    aws_region: "us-west-2"  # Specify the AWS region if needed
    aws_role: "arn:aws:iam::<<account-id>>:role/<<role-name>>"  # Specify the IAM role ARN
sink:
  type: datahub-rest
  config:
    server: "http://datahub-datahub-gms:8080"

Important Notes:

Would you like more detailed steps or additional information on configuring this setup?

Sources:

How can I ingest tables from multiple accounts? <@U06TM7M40RK>

To ingest DynamoDB tables from multiple AWS accounts, you can configure multiple ingestion recipes, each specifying a different AWS account ID and IAM role. This approach allows you to manage and ingest metadata from various accounts separately.

Steps to Configure Ingestion from Multiple Accounts

  1. Create IAM Policies and Roles: For each AWS account, create an IAM policy with the necessary permissions and attach it to an IAM role.
  2. Associate IAM Roles with Kubernetes Service Accounts: Link each IAM role to a Kubernetes service account.
  3. Configure Multiple Ingestion Recipes: Set up separate ingestion recipes for each AWS account, specifying the respective IAM role and account ID.

Example Configuration

Step 1: Create IAM Policies and Roles
Create IAM policies with the required permissions for each AWS account. For example, for DynamoDB ingestion, you might need permissions like dynamodb:ListTables, dynamodb:DescribeTable, etc.

Example IAM Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:ListTables",
                "dynamodb:DescribeTable",
                "dynamodb:Scan"
            ],
            "Resource": "*"
        }
    ]
}

Create IAM roles in each AWS account and attach the respective policies.

Step 2: Associate IAM Roles with Kubernetes Service Accounts
Follow the <https://docs.aws.amazon.com/eks/latest/userguide/create-service-account-iam-policy-and-role.html|AWS EKS guide> to associate Kubernetes service accounts with the IAM roles. Here is an example command using eksctl:

eksctl create iamserviceaccount \
    --name acryl-datahub-actions-account1 \
    --namespace datahub \
    --cluster datahub \
    --attach-policy-arn arn:aws:iam::<<account1-id>>:policy/policy1 \
    --approve \
    --override-existing-serviceaccounts

eksctl create iamserviceaccount \
    --name acryl-datahub-actions-account2 \
    --namespace datahub \
    --cluster datahub \
    --attach-policy-arn arn:aws:iam::<<account2-id>>:policy/policy2 \
    --approve \
    --override-existing-serviceaccounts

Update the values.yaml file for the DataHub Helm chart to use the service accounts:

acryl-datahub-actions-account1:
  enabled: true
  serviceAccount:
    name: acryl-datahub-actions-account1

acryl-datahub-actions-account2:
  enabled: true
  serviceAccount:
    name: acryl-datahub-actions-account2

Step 3: Configure Multiple Ingestion Recipes
Create separate ingestion recipes for each AWS account, specifying the respective IAM role and account ID.

Example Ingestion Recipe for Account 1:

source:
  type: dynamodb
  config:
    platform_instance: "123456789012"  # Replace with AWS account ID for account 1
    aws_region: "us-west-2"  # Specify the AWS region if needed
    aws_role: "arn:aws:iam::123456789012:role/role1"  # Specify the IAM role ARN for account 1
sink:
  type: datahub-rest
  config:
    server: "<http://datahub-datahub-gms:8080>"

Example Ingestion Recipe for Account 2:

source:
  type: dynamodb
  config:
    platform_instance: "987654321098"  # Replace with AWS account ID for account 2
    aws_region: "us-east-1"  # Specify the AWS region if needed
    aws_role: "arn:aws:iam::987654321098:role/role2"  # Specify the IAM role ARN for account 2
sink:
  type: datahub-rest
  config:
    server: "<http://datahub-datahub-gms:8080>"

Important Notes:

Would you like more detailed steps or additional information on configuring this setup?

Sources:

what if I have only one role which is associated with a specific account, but i also want to use it for other accounts? <@U06TM7M40RK>

If you have a single IAM role associated with a specific AWS account but want to use it to ingest DynamoDB tables from multiple accounts, you can configure cross-account access. This involves setting up trust relationships between the accounts, allowing the IAM role in the primary account to assume roles in the other accounts.

Steps to Configure Cross-Account Access

  1. Create IAM Policies and Roles in Each Account: Define the necessary permissions and create IAM roles in each AWS account.
  2. Set Up Trust Relationships: Configure the trust relationships to allow the primary IAM role to assume roles in the other accounts.
  3. Configure the Ingestion Recipe: Set up the ingestion recipe to use the primary IAM role and assume roles in the other accounts.

Example Configuration

Step 1: Create IAM Policies and Roles in Each Account
Create IAM policies with the required permissions for each AWS account. For example, for DynamoDB ingestion, you might need permissions like dynamodb:ListTables, dynamodb:DescribeTable, etc.

Example IAM Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:ListTables",
                "dynamodb:DescribeTable",
                "dynamodb:Scan"
            ],
            "Resource": "*"
        }
    ]
}

Create IAM roles in each AWS account and attach the respective policies.

Step 2: Set Up Trust Relationships
Configure the trust relationships to allow the primary IAM role to assume roles in the other accounts. This involves updating the trust policy of the IAM roles in the other accounts to trust the primary IAM role.

Example Trust Policy for Account 2:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789012:role/primary-role"  # Replace with the ARN of the primary IAM role
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Step 3: Configure the Ingestion Recipe
In your ingestion recipe, configure the primary IAM role to assume roles in the other accounts.

Example Ingestion Recipe for Multiple Accounts:

source:
  type: dynamodb
  config:
    platform_instance: "123456789012"  # Primary AWS account ID
    aws_region: "us-west-2"  # Specify the AWS region if needed
    aws_role: "arn:aws:iam::123456789012:role/primary-role"  # Primary IAM role ARN
    assume_role_arns:
      - "arn:aws:iam::987654321098:role/role2"  # Role ARN in account 2
      - "arn:aws:iam::112233445566:role/role3"  # Role ARN in account 3
sink:
  type: datahub-rest
  config:
    server: "http://datahub-datahub-gms:8080"

Important Notes:

Would you like more detailed steps or additional information on configuring this setup?

Sources:

<@U06TM7M40RK> I have an account called dev-services which has a role called datahub-ingestion. I want to use only this role to ingest dynamodb tables from multiple accounts like dev-services, test-dataops and dev-admin. How can I do this?

To use a single IAM role (datahub-ingestion in the dev-services account) to ingest DynamoDB tables from multiple accounts (dev-services, test-dataops, and dev-admin), you need to set up cross-account access. This involves configuring trust relationships between the dev-services account and the other accounts, allowing the datahub-ingestion role to assume roles in the test-dataops and dev-admin accounts.

Steps to Configure Cross-Account Access

  1. Create IAM Policies and Roles in Each Account: Define the necessary permissions and create IAM roles in each AWS account.
  2. Set Up Trust Relationships: Configure the trust relationships to allow the datahub-ingestion role in the dev-services account to assume roles in the test-dataops and dev-admin accounts.
  3. Configure the Ingestion Recipe: Set up the ingestion recipe to use the datahub-ingestion role and assume roles in the other accounts.

Example Configuration

Step 1: Create IAM Policies and Roles in Each Account
Create IAM policies with the required permissions for each AWS account. For example, for DynamoDB ingestion, you might need permissions like dynamodb:ListTables, dynamodb:DescribeTable, etc.

Example IAM Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:ListTables",
                "dynamodb:DescribeTable",
                "dynamodb:Scan"
            ],
            "Resource": "*"
        }
    ]
}

Create IAM roles in the test-dataops and dev-admin accounts and attach the respective policies.

Step 2: Set Up Trust Relationships
Configure the trust relationships to allow the datahub-ingestion role in the dev-services account to assume roles in the test-dataops and dev-admin accounts.

Example Trust Policy for test-dataops and dev-admin Accounts:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<dev-services-account-id>:role/datahub-ingestion"  # Replace with the ARN of the datahub-ingestion role
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Step 3: Configure the Ingestion Recipe
In your ingestion recipe, configure the datahub-ingestion role to assume roles in the test-dataops and dev-admin accounts.

Example Ingestion Recipe for Multiple Accounts:

source:
  type: dynamodb
  config:
    platform_instance: "dev-services"  # Primary AWS account ID
    aws_region: "us-west-2"  # Specify the AWS region if needed
    aws_role: "arn:aws:iam::&lt;dev-services-account-id&gt;:role/datahub-ingestion"  # Primary IAM role ARN
    assume_role_arns:
      - "arn:aws:iam::&lt;test-dataops-account-id&gt;:role/datahub-ingestion"  # Role ARN in test-dataops account
      - "arn:aws:iam::&lt;dev-admin-account-id&gt;:role/datahub-ingestion"  # Role ARN in dev-admin account
sink:
  type: datahub-rest
  config:
    server: "<http://datahub-datahub-gms:8080>"

Important Notes:

Would you like more detailed steps or additional information on configuring this setup?

Sources:

Is it mandatory that I have to create a role in each account? is there any other way? <@U06TM7M40RK>

If you want to avoid creating separate roles in each account, you can configure the primary IAM role in the dev-services account to have cross-account access permissions directly. This way, the datahub-ingestion role in the dev-services account can access DynamoDB tables in the test-dataops and dev-admin accounts without needing additional roles in those accounts.

Steps to Configure Cross-Account Access Without Creating Additional Roles

  1. Create IAM Policy in the dev-services Account: Define the necessary permissions for accessing DynamoDB tables in all target accounts.
  2. Attach the Policy to the datahub-ingestion Role: Attach the policy to the datahub-ingestion role in the dev-services account.
  3. Configure the Ingestion Recipe: Set up the ingestion recipe to use the datahub-ingestion role.

Example Configuration

Step 1: Create IAM Policy in the dev-services Account
Create an IAM policy that grants the datahub-ingestion role permissions to access DynamoDB tables in the dev-services, test-dataops, and dev-admin accounts.

Example IAM Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:ListTables",
                "dynamodb:DescribeTable",
                "dynamodb:Scan"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-west-2:<dev-services-account-id>:table/*",
                "arn:aws:dynamodb:us-west-2:<test-dataops-account-id>:table/*",
                "arn:aws:dynamodb:us-west-2:<dev-admin-account-id>:table/*"
            ]
        }
    ]
}

Step 2: Attach the Policy to the datahub-ingestion Role
Attach the policy created in Step 1 to the datahub-ingestion role in the dev-services account.

Step 3: Configure the Ingestion Recipe
In your ingestion recipe, configure the datahub-ingestion role to access DynamoDB tables in all specified accounts.

Example Ingestion Recipe for Multiple Accounts:

source:
  type: dynamodb
  config:
    platform_instance: "dev-services"  # Primary AWS account ID
    aws_region: "us-west-2"  # Specify the AWS region if needed
    aws_role: "arn:aws:iam::<dev-services-account-id>:role/datahub-ingestion"  # Primary IAM role ARN
    assume_role_arns:
      - "arn:aws:iam::<test-dataops-account-id>:role/datahub-ingestion"  # Role ARN in test-dataops account
      - "arn:aws:iam::<dev-admin-account-id>:role/datahub-ingestion"  # Role ARN in dev-admin account
sink:
  type: datahub-rest
  config:
    server: "http://datahub-datahub-gms:8080"

Important Notes:

Would you like more detailed steps or additional information on configuring this setup?

Sources:

<@U06TM7M40RK> to use secret keys here instead of iam role

If you prefer to use AWS access keys (secret keys) instead of IAM roles for ingesting DynamoDB tables from multiple accounts, you can configure your ingestion recipes to include the access keys for each account. This approach involves specifying the AWS access key ID and secret access key directly in the ingestion recipe.

Steps to Configure Ingestion Using Secret Keys

  1. Create IAM Users in Each Account: Create IAM users in each AWS account and generate access keys for them.
  2. Attach Necessary Policies to IAM Users: Attach policies to the IAM users to grant them the necessary permissions to access DynamoDB tables.
  3. Configure the Ingestion Recipes: Set up separate ingestion recipes for each AWS account, specifying the respective access keys.

Example Configuration

Step 1: Create IAM Users in Each Account
Create IAM users in the dev-services, test-dataops, and dev-admin accounts. Generate access keys for each user.

Step 2: Attach Necessary Policies to IAM Users
Attach policies to the IAM users to grant them the necessary permissions to access DynamoDB tables.

Example IAM Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:ListTables",
                "dynamodb:DescribeTable",
                "dynamodb:Scan"
            ],
            "Resource": "*"
        }
    ]
}

Step 3: Configure the Ingestion Recipes
Create separate ingestion recipes for each AWS account, specifying the respective access keys.

Example Ingestion Recipe for dev-services Account:

source:
  type: dynamodb
  config:
    platform_instance: "dev-services"  # Primary AWS account ID
    aws_region: "us-west-2"  # Specify the AWS region if needed
    aws_access_key_id: "your_dev_services_access_key_id"
    aws_secret_access_key: "your_dev_services_secret_access_key"
sink:
  type: datahub-rest
  config:
    server: "http://datahub-datahub-gms:8080"

Example Ingestion Recipe for test-dataops Account:

source:
  type: dynamodb
  config:
    platform_instance: "test-dataops"  # AWS account ID for test-dataops
    aws_region: "us-west-2"  # Specify the AWS region if needed
    aws_access_key_id: "your_test_dataops_access_key_id"
    aws_secret_access_key: "your_test_dataops_secret_access_key"
sink:
  type: datahub-rest
  config:
    server: "http://datahub-datahub-gms:8080"

Example Ingestion Recipe for dev-admin Account:

source:
  type: dynamodb
  config:
    platform_instance: "dev-admin"  # AWS account ID for dev-admin
    aws_region: "us-west-2"  # Specify the AWS region if needed
    aws_access_key_id: "your_dev_admin_access_key_id"
    aws_secret_access_key: "your_dev_admin_secret_access_key"
sink:
  type: datahub-rest
  config:
    server: "http://datahub-datahub-gms:8080"

Important Notes:

Would you like more detailed steps or additional information on configuring this setup? :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.