Ingesting Airflow Metadata Using DataHub CLI and Recipe Configuration

Original Slack Thread

<@U06TM7M40RK> is it possible to run CLI ingestion for Airflow plugin type? if so, give me please example of the recipe

To run CLI ingestion for Airflow metadata using DataHub, you need to ensure that the DataHub Airflow plugin is properly installed and configured. The Airflow plugin is typically used to automatically capture metadata and lineage information when Airflow DAGs are executed. However, if you want to manually ingest Airflow metadata using the CLI, you can create a recipe file and run it using the DataHub CLI.

Here’s a basic example of how you might configure a recipe for ingesting Airflow metadata:

  1. Install the DataHub Airflow Plugin: Ensure that the DataHub Airflow plugin is installed in your Airflow environment. You can install it using:

    pip install 'acryl-datahub-airflow-plugin[plugin-v2]'
    

    This is necessary for Airflow versions 2.3+ and Python 3.8+ as per the <https://github.com/datahub-project/datahub/blob/master/docs/lineage/airflow.md|Airflow Plugin v2 documentation>.

  2. Create a Recipe File: Create a YAML file that defines the source and sink for metadata ingestion. Here is an example configuration:

    # airflow_recipe.yml
    source:
      type: airflow
      config:
        cluster: "your_airflow_cluster"
        airflow_url: "http://your-airflow-instance:8080"
        username: "your_username"
        password: "your_password"
    sink:
      type: datahub-rest
      config:
        server: "http://localhost:8080"
    

    In this example, replace "your_airflow_cluster", "http://your-airflow-instance:8080", "your_username", and "your_password" with your actual Airflow cluster details and credentials.

  3. Run the Ingestion: Use the DataHub CLI to run the ingestion based on your recipe file.

    datahub ingest -c path/to/airflow_recipe.yml
    

This command will ingest metadata from the Airflow instance specified in the recipe file and send it to the DataHub instance.

Would you like more detailed steps or additional information on any specific part of the process?

Sources:

Hi <@U08MYDUCFV2>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><