<@U06TM7M40RK> What are the permissions required on AWS to pull metadata from MWAA to Datahub?
To pull metadata from Amazon Managed Workflows for Apache Airflow (MWAA) into DataHub, you need to ensure that your AWS credentials have the necessary permissions. However, the provided data sources do not explicitly mention MWAA.
For similar AWS services, such as AWS Glue, the required permissions typically include actions to list and get metadata. Based on the permissions required for AWS Glue, you might need similar permissions for MWAA.
Here is an example IAM policy that includes permissions for listing and getting metadata, which you can adapt for MWAA:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"mwaa:ListEnvironments",
"mwaa:GetEnvironment",
"mwaa:ListTagsForResource",
"mwaa:ListExecutions",
"mwaa:GetExecution"
],
"Resource": "*"
}
]
}
Would you like more details on how to configure these permissions or any other assistance?
Sources:
- <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/glue|AWS Glue IAM Permissions>
- <https://github.com/datahub-project/datahub/blob/master/docs/deploy/aws.md|AWS Glue Schema Registry>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> Create recipe in this case?
To create an ingestion recipe for pulling metadata from Amazon Managed Workflows for Apache Airflow (MWAA) into DataHub, you need to define the source and sink configurations in a YAML file. Although the provided data sources do not explicitly mention MWAA, you can adapt the general approach used for other AWS services.
Here is an example ingestion recipe (mwaa_ingestion.yaml
) that you can use as a template:
source:
type: "airflow"
config:
airflow_url: "https://your-mwaa-endpoint.amazonaws.com"
username: "your_airflow_username"
password: "your_airflow_password"
env: "PROD"
include_task_templates: true
include_lineage: true
include_task_instances: true
include_dag_runs: true
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080" # Replace with your DataHub instance URL
token: "your_datahub_token" # Replace with your DataHub auth token if required
Explanation:
-
Source Configuration:
type
: Specifies the type of source, which isairflow
in this case.airflow_url
: The URL of your MWAA endpoint.username
andpassword
: Credentials for accessing MWAA.env
: The environment name (e.g.,PROD
).include_task_templates
,include_lineage
,include_task_instances
,include_dag_runs
: Flags to include various metadata from Airflow.
-
Sink Configuration:
type
: Specifies the type of sink, which isdatahub-rest
.server
: The URL of your DataHub instance.token
: Your DataHub authentication token.
Running the Ingestion:
To run the ingestion with the above recipe, use the following command:
datahub ingest -c mwaa_ingestion.yaml
This command will read the metadata from MWAA and ingest it into your DataHub instance.
Would you like more detailed steps or further assistance on any specific part of this process?
Sources:
- <Ingestion | DataHub Ingestion Overview>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><