Troubleshooting IBM DB2 Scanning with Datahub CLI

Original Slack Thread

<@U04PK0DP563> I see you were able to scan the IBM DB2. We also trying to scan the IBM DB2 using datahub CLI but getting the error. I installed the sqlalchemy and the ibm-da-sa dialect on the machine where datahub CLI is running. But still getting the error. Can you please help me how you ingested IBM Db2

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses! We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66|> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :+1: or :-1: so other folks in the Community know if you’re still looking for help on the topic and to help us improve the assistant!

To assist you better, could you please provide the following information?

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub (e.g., BigQuery)?

This information will help in providing a more accurate and detailed response.

In the meantime, here is some general guidance based on the available data sources:

General Guidance

According to the <https://datahubproject.io/docs/0.13.1/ui-ingestion|DataHub Ingestion documentation>, DataHub supports creating, configuring, scheduling, and executing batch metadata ingestion using the DataHub user interface. This includes a variety of natively supported integrations, such as Snowflake, Postgres, and Kafka. For DB2, you can use the sqlalchemy source type.

Example Ingestion Recipe for DB2

Here is an example of an ingestion recipe for DB2 using SQLAlchemy, as mentioned in a <Slack conversation>:

source:
    type: sqlalchemy
    config:
        platform: DB2
        connect_uri: 'db2+ibm_db://user:password@host:port/database'
        include_tables: true
        include_views: true

sink:
    type: datahub-rest
    config:
        server: ${DATAHUB_GMS_HOST}
        token: ${DATAHUB_GMS_TOKEN}

Steps to Scan DB2 Databases

  1. Create an Ingestion Source: Navigate to the ‘Ingestion’ tab in DataHub and click on + Create new source.
  2. Select a Platform Template: Choose the sqlalchemy template.
  3. Configure the Recipe: Use the YAML configuration similar to the example provided above, adjusting the connect_uri and other parameters as needed for your DB2 setup.
  4. Execute the Ingestion: Schedule and run the ingestion job.

Would you like more detailed steps or have specific questions about any part of the process?

Sources: