Connecting PostgreSQL to DataHub for Metadata Ingestion

Original Slack Thread

<@U06TM7M40RK> can we connect postgress sql to datahub?

Yes, you can connect PostgreSQL to DataHub for metadata ingestion. DataHub provides a Postgres ingestion source that allows you to extract metadata from PostgreSQL databases, including databases, schemas, views, and tables. You can also enable optional features like data profiling and table-level lineage.

To connect PostgreSQL to DataHub, you need to:

  1. Install the Postgres Plugin: Ensure that the Postgres plugin for DataHub is installed. You can do this by running the following command:

    pip install 'acryl-datahub[postgres]'
    
  2. Create an Ingestion Recipe: Create a YAML file for your ingestion recipe. Here is a sample configuration:

    source:
      type: postgres
      config:
        host_port: 'localhost:5432'
        database: 'your_database'
        username: 'your_username'
        password: 'your_password'
        include_tables: true
        include_views: true
        profiling:
          enabled: true
    sink:
      type: datahub-rest
      config:
        server: 'http://localhost:8080'
    
  3. Run the Ingestion: Use the following command to run the ingestion:

    datahub ingest -c your_recipe.yml
    

For more detailed steps and configuration options, you can refer to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/postgres|Postgres Ingestion Source documentation>.

Would you like more detailed steps or help with a specific part of the setup?

Sources:

Hi <@U087HP4P21X>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.