Troubleshooting Kafka-Connect Ingest Error in DataHub Version 0.14.0.4

Original Slack Thread

regarding Kafka-Connect Ingest in datahub version:

DataHub CLI version: 0.14.0.4
Models: bundled
Python version: 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:24:10) [GCC 9.4.0]

I get following error:

               'message': 'Ingestion pipeline raised an unexpected exception!',
               'context': ["<class 'sqlalchemy.exc.ArgumentError'>: Could not parse SQLAlchemy URL from string "```
I used following recipe:

```source:
  type: "kafka-connect"
  config:
    # Coordinates
    connect_uri: "<https://broker1.our.broker.address:8083>"
    platform_instance: "kafka_connect_devfuture"
    env: DEV 
    # Credentials
    username: '${KAFKA_CONNECT_USER}'
    password: '${KAFKA_CONNECT_PASSWORD}'
    convert_lineage_urns_to_lowercase: true

    # Optional
    # Platform instance mapping to use when constructing URNs.
    # Use if single instance of platform is referred across connectors.
    # Map of platform name to platform instance
    platform_instance_map:
      oracle: ods
      postgres: ps_release_dev
      kafka: devfuture_kafka


# Recipe sink configuration.
sink:
  type: "datahub-rest"
  config:
    server: '${AMDH_SERVER}' 
    token: '${AMDH_INGEST_TOKEN}'```
Any ideas? THX

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To address the error you’re encountering with the Kafka-Connect ingestion in DataHub, let’s break down the potential issues and solutions based on the provided data sources.

Potential Issues and Solutions

  1. SQLAlchemy URL Parsing Error:
    The error message Could not parse SQLAlchemy URL from string suggests that there might be an issue with the connection string format in your recipe. Ensure that the connect_uri and other connection strings are correctly formatted.

  2. Stateful Ingestion:
    If you are using stateful ingestion, it might be causing issues with the Kafka-Connect source. According to <Slack Slack thread>, disabling stateful ingestion can resolve some issues with Kafka sinks. However, your provided recipe does not include stateful ingestion, so this might not be directly relevant.

  3. Platform Instance Mapping:
    Ensure that the platform_instance_map is correctly configured. According to the <Kafka Connect | DataHub documentation>, the platform_instance_map should correctly map platform names to their instances.

  4. Environment Variables:
    Verify that the environment variables (${KAFKA_CONNECT_USER}, ${KAFKA_CONNECT_PASSWORD}, ${AMDH_SERVER}, ${AMDH_INGEST_TOKEN}) are correctly set and accessible in your environment.

Debugging Steps

  1. Check Connection URI:
    Ensure that the connect_uri is correctly formatted and accessible.

  2. Validate Environment Variables:
    Make sure all environment variables used in the recipe are correctly set.

  3. Review Platform Instance Mapping:
    Double-check the platform_instance_map to ensure it correctly maps platform names to instances.

  4. Disable Stateful Ingestion:
    If you have stateful ingestion enabled elsewhere, try disabling it to see if it resolves the issue.

Example Recipe

Here is an example recipe based on the provided information and ensuring correct formatting:

source:
  type: "kafka-connect"
  config:
    connect_uri: "https://broker1.our.broker.address:8083"
    platform_instance: "kafka_connect_devfuture"
    env: DEV
    username: '${KAFKA_CONNECT_USER}'
    password: '${KAFKA_CONNECT_PASSWORD}'
    convert_lineage_urns_to_lowercase: true
    platform_instance_map:
      oracle: ods
      postgres: ps_release_dev
      kafka: devfuture_kafka

sink:
  type: "datahub-rest"
  config:
    server: '${AMDH_SERVER}'
    token: '${AMDH_INGEST_TOKEN}'

Would you like more detailed steps or further assistance on any specific part of the configuration?

Sources:

How do I have to pass the correct URL for kafka_connect.py?

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><