Ingesting Metadata from Multiple PostgreSQL Servers in a Single Ingestion

Original Slack Thread

<@U06TM7M40RK> -can I connect and ingest metadata from multiple postgres servers in single ingestion?

Yes, you can connect and ingest metadata from multiple PostgreSQL servers in a single ingestion process using DataHub. This can be achieved by configuring multiple source connectors in your ingestion recipe. Each source connector can be configured to connect to a different PostgreSQL instance.

According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/postgres|DataHub documentation>, you can define multiple source configurations within a single recipe file. Each source configuration will specify the connection details for a different PostgreSQL server.

Here’s a basic example of how you might configure an ingestion recipe to connect to multiple PostgreSQL servers:

sources:
  - type: postgres
    config:
      host_port: "postgres-server-1:5432"
      database: "database1"
      username: "user1"
      password: "password1"
      include_tables: true
      include_views: true

  - type: postgres
    config:
      host_port: "postgres-server-2:5432"
      database: "database2"
      username: "user2"
      password: "password2"
      include_tables: true
      include_views: true

sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

In this example, two PostgreSQL servers are specified, each with its own connection details. The metadata from both servers will be ingested into DataHub.

Would you like more detailed steps or further assistance on this process?

Sources:

<@U06TM7M40RK> - Also, can we ingest multiple databases the same time from different servers?

Yes, you can ingest multiple databases at the same time from different PostgreSQL servers using DataHub. This can be done by configuring your ingestion recipe to include multiple source configurations, each targeting a different server and specifying the databases you want to ingest.

According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/postgres|DataHub documentation>, you can use the database_pattern configuration to specify which databases to include or exclude during ingestion. This allows you to target multiple databases on each server.

Here’s an example of how you might configure an ingestion recipe to ingest multiple databases from different PostgreSQL servers:

sources:
  - type: postgres
    config:
      host_port: "postgres-server-1:5432"
      username: "user1"
      password: "password1"
      database_pattern:
        allow:
          - "database1"
          - "database2"
      include_tables: true
      include_views: true

  - type: postgres
    config:
      host_port: "postgres-server-2:5432"
      username: "user2"
      password: "password2"
      database_pattern:
        allow:
          - "database3"
          - "database4"
      include_tables: true
      include_views: true

sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

In this example, the recipe is configured to ingest metadata from two PostgreSQL servers, each with multiple databases specified in the database_pattern configuration.

Would you like more detailed steps or further assistance on this process?

Sources:

<@U06TM7M40RK> - How to get metedata for all the databases without providing the list of databases?

To ingest metadata for all databases from a PostgreSQL server without explicitly listing them, you can use the database_pattern configuration in your DataHub ingestion recipe. By setting the allow pattern to .*, you can include all databases available on the server.

According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/postgres|DataHub documentation>, the database_pattern option allows you to specify regex patterns to filter databases for ingestion. By default, it includes all databases if not specified otherwise.

Here’s an example of how you might configure your ingestion recipe to ingest metadata from all databases on a PostgreSQL server:

source:
  type: postgres
  config:
    host_port: "your-postgres-server:5432"
    username: "your-username"
    password: "your-password"
    database_pattern:
      allow:
        - ".*"  # This pattern allows all databases
    include_tables: true
    include_views: true

sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

In this configuration, the database_pattern is set to allow all databases by using the regex pattern .*. This will ingest metadata from all databases available on the specified PostgreSQL server.

Would you like more detailed steps or further assistance on this process?

Sources:

<@U06TM7M40RK> - after using the above patter getting this error.
failed: FATAL: permission denied for database

The error message “FATAL: permission denied for database” indicates that the user account being used for the ingestion does not have the necessary permissions to access one or more databases on the PostgreSQL server. To resolve this issue, you need to ensure that the user has the appropriate permissions to access all the databases you intend to ingest.

According to <Slack Slack discussion>, the minimum permission required for DataHub to work with PostgreSQL databases is the SELECT permission on the targeted databases. You need to grant this permission to the user account being used for ingestion.

Here’s how you can grant the necessary permissions:

  1. Connect to PostgreSQL: Use a PostgreSQL client to connect to your server as a superuser or a user with sufficient privileges to grant permissions.

  2. Grant Permissions: Execute the following SQL command to grant SELECT permission on all tables in each database you want to ingest:

    GRANT SELECT ON ALL TABLES IN SCHEMA public TO your_user;
    

    Replace your_user with the username of the account being used for ingestion. You may need to repeat this command for each schema in each database.

  3. Verify Permissions: Ensure that the user has the necessary permissions by attempting to connect to each database and run a simple SELECT query.

If you continue to encounter issues, please check the specific database and schema permissions for the user account and ensure that there are no network or connectivity issues.

Would you like more detailed steps or further assistance on this process?

Sources:

  • <Slack Slack discussion> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> - can I ignore and continue if permission denied on database encountered?

To handle permission errors gracefully and continue with the ingestion process when a “permission denied” error is encountered, you can configure your ingestion recipe to use the allow and deny patterns effectively. However, DataHub does not have a built-in feature to automatically skip databases with permission issues and continue with others.

According to a <Slack discussion>, there is currently no configuration option to automatically ignore permission errors and continue the ingestion process. Instead, you can manually specify which databases to include or exclude using the database_pattern configuration.

Here’s a general approach to handle this situation:

  1. Identify Databases with Permission Issues: First, identify which databases are causing permission errors by reviewing the logs.

  2. Update Ingestion Recipe: Modify your ingestion recipe to exclude these databases using the deny pattern in the database_pattern configuration.

Example configuration:

source:
  type: postgres
  config:
    host_port: "your-postgres-server:5432"
    username: "your-username"
    password: "your-password"
    database_pattern:
      allow:
        - ".*"  # Allow all databases
      deny:
        - "database_with_permission_issue"  # Deny specific databases with permission issues
    include_tables: true
    include_views: true

sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

In this example, the deny pattern is used to exclude specific databases that are known to have permission issues.

Would you like more detailed steps or further assistance on this process?

Sources:

  • <Slack discussion> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><