<@U06TM7M40RK> I want dynamodb databases only from specific accounts to be ingested. How to configure?
To configure DataHub to ingest DynamoDB databases only from specific AWS accounts, you can use the platform_instance
configuration in your ingestion recipe. This configuration allows you to specify the AWS account ID, ensuring that only DynamoDB tables from the specified account are ingested.
Here is a basic example of how to set up your ingestion recipe:
source:
type: dynamodb
config:
platform_instance: "AWS_ACCOUNT_ID" # Replace with your AWS account ID
aws_access_key_id: "${AWS_ACCESS_KEY_ID}"
aws_secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
aws_region: "${AWS_REGION}" # Specify the AWS region if needed
sink:
type: datahub-rest
config:
server: "http://datahub-datahub-gms:8080"
Steps to Configure:
- Specify the AWS Account ID: Use the
platform_instance
field to specify the AWS account ID from which you want to ingest DynamoDB tables. - Provide AWS Credentials: Ensure that you provide the necessary AWS credentials (
aws_access_key_id
andaws_secret_access_key
). - Specify AWS Region: Optionally, you can specify the AWS region using the
aws_region
field to limit the ingestion to a specific region.
Example Recipe:
source:
type: dynamodb
config:
platform_instance: "123456789012" # Replace with your AWS account ID
aws_access_key_id: "your_access_key_id"
aws_secret_access_key: "your_secret_access_key"
aws_region: "us-west-2" # Replace with your desired AWS region
sink:
type: datahub-rest
config:
server: "http://datahub-datahub-gms:8080"
Important Notes:
- Ensure that the AWS credentials provided have the necessary permissions to access DynamoDB tables in the specified account.
- The
aws_region
field is required in the latest versions of the DynamoDB connector, as it no longer loops through all AWS regions by default (https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|source).
Would you like more detailed steps or additional information on configuring this setup?
Sources:
- https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> what if I have to only use one iam_role?
To configure DataHub to ingest DynamoDB databases using a specific IAM role, you can leverage the AWS IAM role assumption feature. This approach enhances security by avoiding the use of static AWS credentials and instead using IAM roles that can be assumed by the DataHub ingestion process.
Here are the steps to configure your ingestion recipe to use a specific IAM role:
- Create an IAM Policy: Define the necessary permissions for the metadata ingestion.
- Create an IAM Role: Attach the policy to an IAM role.
- Associate IAM Role with Kubernetes Service Account: Link the IAM role to a Kubernetes service account.
- Configure the Ingestion Recipe: Set up the ingestion recipe to use the IAM role.
Step-by-Step Guide
Step 1: Create an IAM Policy
Create an IAM policy with the required permissions. For example, for DynamoDB ingestion, you might need permissions like dynamodb:ListTables
, dynamodb:DescribeTable
, etc.
Example IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:ListTables",
"dynamodb:DescribeTable",
"dynamodb:Scan"
],
"Resource": "*"
}
]
}
Step 2: Create an IAM Role
Create an IAM role and attach the policy created in Step 1. Ensure the role can be assumed by the service or instance running the DataHub ingestion.
Step 3: Associate IAM Role with Kubernetes Service Account
Follow the <https://docs.aws.amazon.com/eks/latest/userguide/create-service-account-iam-policy-and-role.html|AWS EKS guide> to associate a Kubernetes service account with the IAM role. Here is an example command using eksctl
:
eksctl create iamserviceaccount \
--name acryl-datahub-actions \
--namespace datahub \
--cluster datahub \
--attach-policy-arn arn:aws:iam::<<account-id>>:policy/policy1 \
--approve \
--override-existing-serviceaccounts
Update the values.yaml
file for the DataHub Helm chart to use the service account:
acryl-datahub-actions:
enabled: true
serviceAccount:
name: acryl-datahub-actions
Step 4: Configure the Ingestion Recipe
In your ingestion recipe, you can omit the AWS access key and secret key. The DataHub ingestion process will use the default AWS credential provider chain, which includes the IAM role.
Example Ingestion Recipe for DynamoDB:
source:
type: dynamodb
config:
platform_instance: "AWS_ACCOUNT_ID" # Replace with your AWS account ID
aws_region: "us-west-2" # Specify the AWS region if needed
aws_role: "arn:aws:iam::<<account-id>>:role/<<role-name>>" # Specify the IAM role ARN
sink:
type: datahub-rest
config:
server: "http://datahub-datahub-gms:8080"
Important Notes:
- Ensure that the IAM role specified in the
aws_role
field has the necessary permissions to access DynamoDB tables in the specified account. - The
aws_region
field is required in the latest versions of the DynamoDB connector, as it no longer loops through all AWS regions by default (https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|source).
Would you like more detailed steps or additional information on configuring this setup?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/deploy/aws.md|DataHub AWS Deployment Guide> 0 button 0 button Hint: Mention @runllm in the thread for followups.
How can I ingest tables from multiple accounts? <@U06TM7M40RK>
To ingest DynamoDB tables from multiple AWS accounts, you can configure multiple ingestion recipes, each specifying a different AWS account ID and IAM role. This approach allows you to manage and ingest metadata from various accounts separately.
Steps to Configure Ingestion from Multiple Accounts
- Create IAM Policies and Roles: For each AWS account, create an IAM policy with the necessary permissions and attach it to an IAM role.
- Associate IAM Roles with Kubernetes Service Accounts: Link each IAM role to a Kubernetes service account.
- Configure Multiple Ingestion Recipes: Set up separate ingestion recipes for each AWS account, specifying the respective IAM role and account ID.
Example Configuration
Step 1: Create IAM Policies and Roles
Create IAM policies with the required permissions for each AWS account. For example, for DynamoDB ingestion, you might need permissions like dynamodb:ListTables
, dynamodb:DescribeTable
, etc.
Example IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:ListTables",
"dynamodb:DescribeTable",
"dynamodb:Scan"
],
"Resource": "*"
}
]
}
Create IAM roles in each AWS account and attach the respective policies.
Step 2: Associate IAM Roles with Kubernetes Service Accounts
Follow the <https://docs.aws.amazon.com/eks/latest/userguide/create-service-account-iam-policy-and-role.html|AWS EKS guide> to associate Kubernetes service accounts with the IAM roles. Here is an example command using eksctl
:
eksctl create iamserviceaccount \
--name acryl-datahub-actions-account1 \
--namespace datahub \
--cluster datahub \
--attach-policy-arn arn:aws:iam::<<account1-id>>:policy/policy1 \
--approve \
--override-existing-serviceaccounts
eksctl create iamserviceaccount \
--name acryl-datahub-actions-account2 \
--namespace datahub \
--cluster datahub \
--attach-policy-arn arn:aws:iam::<<account2-id>>:policy/policy2 \
--approve \
--override-existing-serviceaccounts
Update the values.yaml
file for the DataHub Helm chart to use the service accounts:
acryl-datahub-actions-account1:
enabled: true
serviceAccount:
name: acryl-datahub-actions-account1
acryl-datahub-actions-account2:
enabled: true
serviceAccount:
name: acryl-datahub-actions-account2
Step 3: Configure Multiple Ingestion Recipes
Create separate ingestion recipes for each AWS account, specifying the respective IAM role and account ID.
Example Ingestion Recipe for Account 1:
source:
type: dynamodb
config:
platform_instance: "123456789012" # Replace with AWS account ID for account 1
aws_region: "us-west-2" # Specify the AWS region if needed
aws_role: "arn:aws:iam::123456789012:role/role1" # Specify the IAM role ARN for account 1
sink:
type: datahub-rest
config:
server: "<http://datahub-datahub-gms:8080>"
Example Ingestion Recipe for Account 2:
source:
type: dynamodb
config:
platform_instance: "987654321098" # Replace with AWS account ID for account 2
aws_region: "us-east-1" # Specify the AWS region if needed
aws_role: "arn:aws:iam::987654321098:role/role2" # Specify the IAM role ARN for account 2
sink:
type: datahub-rest
config:
server: "<http://datahub-datahub-gms:8080>"
Important Notes:
- Ensure that the IAM roles specified in the
aws_role
field have the necessary permissions to access DynamoDB tables in the respective accounts. - The
aws_region
field is required in the latest versions of the DynamoDB connector, as it no longer loops through all AWS regions by default (https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|source).
Would you like more detailed steps or additional information on configuring this setup?
Sources:
- https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb 0 button 0 button Hint: Mention @runllm in the thread for followups.
what if I have only one role which is associated with a specific account, but i also want to use it for other accounts? <@U06TM7M40RK>
If you have a single IAM role associated with a specific AWS account but want to use it to ingest DynamoDB tables from multiple accounts, you can configure cross-account access. This involves setting up trust relationships between the accounts, allowing the IAM role in the primary account to assume roles in the other accounts.
Steps to Configure Cross-Account Access
- Create IAM Policies and Roles in Each Account: Define the necessary permissions and create IAM roles in each AWS account.
- Set Up Trust Relationships: Configure the trust relationships to allow the primary IAM role to assume roles in the other accounts.
- Configure the Ingestion Recipe: Set up the ingestion recipe to use the primary IAM role and assume roles in the other accounts.
Example Configuration
Step 1: Create IAM Policies and Roles in Each Account
Create IAM policies with the required permissions for each AWS account. For example, for DynamoDB ingestion, you might need permissions like dynamodb:ListTables
, dynamodb:DescribeTable
, etc.
Example IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:ListTables",
"dynamodb:DescribeTable",
"dynamodb:Scan"
],
"Resource": "*"
}
]
}
Create IAM roles in each AWS account and attach the respective policies.
Step 2: Set Up Trust Relationships
Configure the trust relationships to allow the primary IAM role to assume roles in the other accounts. This involves updating the trust policy of the IAM roles in the other accounts to trust the primary IAM role.
Example Trust Policy for Account 2:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:role/primary-role" # Replace with the ARN of the primary IAM role
},
"Action": "sts:AssumeRole"
}
]
}
Step 3: Configure the Ingestion Recipe
In your ingestion recipe, configure the primary IAM role to assume roles in the other accounts.
Example Ingestion Recipe for Multiple Accounts:
source:
type: dynamodb
config:
platform_instance: "123456789012" # Primary AWS account ID
aws_region: "us-west-2" # Specify the AWS region if needed
aws_role: "arn:aws:iam::123456789012:role/primary-role" # Primary IAM role ARN
assume_role_arns:
- "arn:aws:iam::987654321098:role/role2" # Role ARN in account 2
- "arn:aws:iam::112233445566:role/role3" # Role ARN in account 3
sink:
type: datahub-rest
config:
server: "http://datahub-datahub-gms:8080"
Important Notes:
- Ensure that the IAM roles specified in the
assume_role_arns
field have the necessary permissions to access DynamoDB tables in the respective accounts. - The
aws_region
field is required in the latest versions of the DynamoDB connector, as it no longer loops through all AWS regions by default (https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|source).
Would you like more detailed steps or additional information on configuring this setup?
Sources:
- https://datahubspace.slack.com/archives/CUMUWQU66/p1716493835.116579|https://datahubspace.slack.com/archives/CUMUWQU66/p1716493835.116579
- https://datahubspace.slack.com/archives/CV2UVAPPG/p1722999266.340649|https://datahubspace.slack.com/archives/CV2UVAPPG/p1722999266.340649 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> I have an account called dev-services which has a role called datahub-ingestion. I want to use only this role to ingest dynamodb tables from multiple accounts like dev-services, test-dataops and dev-admin. How can I do this?
To use a single IAM role (datahub-ingestion
in the dev-services
account) to ingest DynamoDB tables from multiple accounts (dev-services
, test-dataops
, and dev-admin
), you need to set up cross-account access. This involves configuring trust relationships between the dev-services
account and the other accounts, allowing the datahub-ingestion
role to assume roles in the test-dataops
and dev-admin
accounts.
Steps to Configure Cross-Account Access
- Create IAM Policies and Roles in Each Account: Define the necessary permissions and create IAM roles in each AWS account.
- Set Up Trust Relationships: Configure the trust relationships to allow the
datahub-ingestion
role in thedev-services
account to assume roles in thetest-dataops
anddev-admin
accounts. - Configure the Ingestion Recipe: Set up the ingestion recipe to use the
datahub-ingestion
role and assume roles in the other accounts.
Example Configuration
Step 1: Create IAM Policies and Roles in Each Account
Create IAM policies with the required permissions for each AWS account. For example, for DynamoDB ingestion, you might need permissions like dynamodb:ListTables
, dynamodb:DescribeTable
, etc.
Example IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:ListTables",
"dynamodb:DescribeTable",
"dynamodb:Scan"
],
"Resource": "*"
}
]
}
Create IAM roles in the test-dataops
and dev-admin
accounts and attach the respective policies.
Step 2: Set Up Trust Relationships
Configure the trust relationships to allow the datahub-ingestion
role in the dev-services
account to assume roles in the test-dataops
and dev-admin
accounts.
Example Trust Policy for test-dataops
and dev-admin
Accounts:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<dev-services-account-id>:role/datahub-ingestion" # Replace with the ARN of the datahub-ingestion role
},
"Action": "sts:AssumeRole"
}
]
}
Step 3: Configure the Ingestion Recipe
In your ingestion recipe, configure the datahub-ingestion
role to assume roles in the test-dataops
and dev-admin
accounts.
Example Ingestion Recipe for Multiple Accounts:
source:
type: dynamodb
config:
platform_instance: "dev-services" # Primary AWS account ID
aws_region: "us-west-2" # Specify the AWS region if needed
aws_role: "arn:aws:iam::<dev-services-account-id>:role/datahub-ingestion" # Primary IAM role ARN
assume_role_arns:
- "arn:aws:iam::<test-dataops-account-id>:role/datahub-ingestion" # Role ARN in test-dataops account
- "arn:aws:iam::<dev-admin-account-id>:role/datahub-ingestion" # Role ARN in dev-admin account
sink:
type: datahub-rest
config:
server: "<http://datahub-datahub-gms:8080>"
Important Notes:
- Ensure that the IAM roles specified in the
assume_role_arns
field have the necessary permissions to access DynamoDB tables in the respective accounts. - The
aws_region
field is required in the latest versions of the DynamoDB connector, as it no longer loops through all AWS regions by default (https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|source).
Would you like more detailed steps or additional information on configuring this setup?
Sources:
- https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb 0 button 0 button Hint: Mention @runllm in the thread for followups.
Is it mandatory that I have to create a role in each account? is there any other way? <@U06TM7M40RK>
If you want to avoid creating separate roles in each account, you can configure the primary IAM role in the dev-services
account to have cross-account access permissions directly. This way, the datahub-ingestion
role in the dev-services
account can access DynamoDB tables in the test-dataops
and dev-admin
accounts without needing additional roles in those accounts.
Steps to Configure Cross-Account Access Without Creating Additional Roles
- Create IAM Policy in the
dev-services
Account: Define the necessary permissions for accessing DynamoDB tables in all target accounts. - Attach the Policy to the
datahub-ingestion
Role: Attach the policy to thedatahub-ingestion
role in thedev-services
account. - Configure the Ingestion Recipe: Set up the ingestion recipe to use the
datahub-ingestion
role.
Example Configuration
Step 1: Create IAM Policy in the dev-services
Account
Create an IAM policy that grants the datahub-ingestion
role permissions to access DynamoDB tables in the dev-services
, test-dataops
, and dev-admin
accounts.
Example IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:ListTables",
"dynamodb:DescribeTable",
"dynamodb:Scan"
],
"Resource": [
"arn:aws:dynamodb:us-west-2:<dev-services-account-id>:table/*",
"arn:aws:dynamodb:us-west-2:<test-dataops-account-id>:table/*",
"arn:aws:dynamodb:us-west-2:<dev-admin-account-id>:table/*"
]
}
]
}
Step 2: Attach the Policy to the datahub-ingestion
Role
Attach the policy created in Step 1 to the datahub-ingestion
role in the dev-services
account.
Step 3: Configure the Ingestion Recipe
In your ingestion recipe, configure the datahub-ingestion
role to access DynamoDB tables in all specified accounts.
Example Ingestion Recipe for Multiple Accounts:
source:
type: dynamodb
config:
platform_instance: "dev-services" # Primary AWS account ID
aws_region: "us-west-2" # Specify the AWS region if needed
aws_role: "arn:aws:iam::<dev-services-account-id>:role/datahub-ingestion" # Primary IAM role ARN
assume_role_arns:
- "arn:aws:iam::<test-dataops-account-id>:role/datahub-ingestion" # Role ARN in test-dataops account
- "arn:aws:iam::<dev-admin-account-id>:role/datahub-ingestion" # Role ARN in dev-admin account
sink:
type: datahub-rest
config:
server: "http://datahub-datahub-gms:8080"
Important Notes:
- Ensure that the
datahub-ingestion
role in thedev-services
account has the necessary permissions to access DynamoDB tables in thetest-dataops
anddev-admin
accounts. - The
aws_region
field is required in the latest versions of the DynamoDB connector, as it no longer loops through all AWS regions by default (https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|source).
Would you like more detailed steps or additional information on configuring this setup?
Sources:
- https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> to use secret keys here instead of iam role
If you prefer to use AWS access keys (secret keys) instead of IAM roles for ingesting DynamoDB tables from multiple accounts, you can configure your ingestion recipes to include the access keys for each account. This approach involves specifying the AWS access key ID and secret access key directly in the ingestion recipe.
Steps to Configure Ingestion Using Secret Keys
- Create IAM Users in Each Account: Create IAM users in each AWS account and generate access keys for them.
- Attach Necessary Policies to IAM Users: Attach policies to the IAM users to grant them the necessary permissions to access DynamoDB tables.
- Configure the Ingestion Recipes: Set up separate ingestion recipes for each AWS account, specifying the respective access keys.
Example Configuration
Step 1: Create IAM Users in Each Account
Create IAM users in the dev-services
, test-dataops
, and dev-admin
accounts. Generate access keys for each user.
Step 2: Attach Necessary Policies to IAM Users
Attach policies to the IAM users to grant them the necessary permissions to access DynamoDB tables.
Example IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:ListTables",
"dynamodb:DescribeTable",
"dynamodb:Scan"
],
"Resource": "*"
}
]
}
Step 3: Configure the Ingestion Recipes
Create separate ingestion recipes for each AWS account, specifying the respective access keys.
Example Ingestion Recipe for dev-services
Account:
source:
type: dynamodb
config:
platform_instance: "dev-services" # Primary AWS account ID
aws_region: "us-west-2" # Specify the AWS region if needed
aws_access_key_id: "your_dev_services_access_key_id"
aws_secret_access_key: "your_dev_services_secret_access_key"
sink:
type: datahub-rest
config:
server: "http://datahub-datahub-gms:8080"
Example Ingestion Recipe for test-dataops
Account:
source:
type: dynamodb
config:
platform_instance: "test-dataops" # AWS account ID for test-dataops
aws_region: "us-west-2" # Specify the AWS region if needed
aws_access_key_id: "your_test_dataops_access_key_id"
aws_secret_access_key: "your_test_dataops_secret_access_key"
sink:
type: datahub-rest
config:
server: "http://datahub-datahub-gms:8080"
Example Ingestion Recipe for dev-admin
Account:
source:
type: dynamodb
config:
platform_instance: "dev-admin" # AWS account ID for dev-admin
aws_region: "us-west-2" # Specify the AWS region if needed
aws_access_key_id: "your_dev_admin_access_key_id"
aws_secret_access_key: "your_dev_admin_secret_access_key"
sink:
type: datahub-rest
config:
server: "http://datahub-datahub-gms:8080"
Important Notes:
- Ensure that the IAM users have the necessary permissions to access DynamoDB tables in their respective accounts.
- The
aws_region
field is required in the latest versions of the DynamoDB connector, as it no longer loops through all AWS regions by default (https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dynamodb|source).
Would you like more detailed steps or additional information on configuring this setup? 0 button 0 button Hint: Mention @runllm in the thread for followups.