IAM Permissions Required for Ingesting Bucket and Object Tags from S3 into DataHub

Original Slack Thread

Hi, we trying to ingest bucket and object tags only using https://datahubproject.io/docs/generated/ingestion/sources/s3/. Can someone tell me what IAM permissions the plugin requires?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To ingest bucket and object tags from S3 into DataHub, you need to ensure that the IAM role or user you are using has the appropriate permissions. Specifically, you will need permissions to list the buckets and objects, as well as to get the tags for both.

Required IAM Permissions

  1. List Buckets and Objects:

    • s3:ListBucket
    • s3:ListAllMyBuckets
  2. Get Bucket and Object Tags:

    • s3:GetBucketTagging
    • s3:GetObjectTagging

Here is an example IAM policy that includes these permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:ListAllMyBuckets",
                "s3:GetBucketTagging",
                "s3:GetObjectTagging"
            ],
            "Resource": "*"
        }
    ]
}

Additional Context

Example Configuration

If you are using a YAML recipe for ingestion, your configuration might look something like this:

source:
  type: s3
  config:
    path_specs:
      - include: "<s3://your-bucket-name/*>"
    aws_config:
      aws_access_key_id: YOUR_ACCESS_KEY
      aws_secret_access_key: YOUR_SECRET_KEY
      aws_region: YOUR_AWS_REGION
    extract_tags: true

Follow-Up Questions

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using?
  3. Are there any specific constraints or additional requirements for your setup?

Feel free to provide more details so we can assist you better!

Sources