Hi, we trying to ingest bucket and object tags only using https://datahubproject.io/docs/generated/ingestion/sources/s3/. Can someone tell me what IAM permissions the plugin requires?
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
To ingest bucket and object tags from S3 into DataHub, you need to ensure that the IAM role or user you are using has the appropriate permissions. Specifically, you will need permissions to list the buckets and objects, as well as to get the tags for both.
Required IAM Permissions
-
List Buckets and Objects:
s3:ListBucket
s3:ListAllMyBuckets
-
Get Bucket and Object Tags:
s3:GetBucketTagging
s3:GetObjectTagging
Here is an example IAM policy that includes these permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListAllMyBuckets",
"s3:GetBucketTagging",
"s3:GetObjectTagging"
],
"Resource": "*"
}
]
}
Additional Context
- Using IAM Roles: If you are using IAM roles, you can specify the role to assume in your DataHub ingestion configuration. According to <Slack Slack conversation>, you can specify an AWS profile or a role to assume.
- Least Privilege Principle: If you are looking to configure least privilege policies, you might want to refer to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/aws/glue.py|Glue IAM permissions documentation> for an example of how to set up permissions for other AWS services.
Example Configuration
If you are using a YAML recipe for ingestion, your configuration might look something like this:
source:
type: s3
config:
path_specs:
- include: "<s3://your-bucket-name/*>"
aws_config:
aws_access_key_id: YOUR_ACCESS_KEY
aws_secret_access_key: YOUR_SECRET_KEY
aws_region: YOUR_AWS_REGION
extract_tags: true
Follow-Up Questions
- Are you using the UI or CLI for ingestion?
- Which DataHub version are you using?
- Are there any specific constraints or additional requirements for your setup?
Feel free to provide more details so we can assist you better!
Sources
- <Slack Conversation on IAM Roles>
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/aws/glue.py|Glue IAM Permissions Documentation>
- <Slack Conversation on Least Privilege Policies> 1 button 0 button Hint: Mention <@U06TM7M40RK> in the thread for followups.