<@U06TM7M40RK> We use managed connectors in Kafka Connect, however we seeing below error while running the Kafka -Connect recipe
datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (kafka-connect): HTTPSConnectionPool(host='bootstrap_server_url', port=8083): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at XXXXXXXXXXX>: Failed to establish a new connection: [Errno 110] Connection timed out'))
The error message you’re encountering indicates that the DataHub ingestion pipeline is unable to establish a connection to the Kafka Connect server at the specified bootstrap_server_url
on port 8083. This is typically due to network connectivity issues or incorrect configuration settings.
Here are some steps to troubleshoot and resolve this issue:
-
Verify Kafka Connect Configuration: Ensure that the
bootstrap_server_url
in your Kafka Connect configuration is correct. It should point to the correct hostname or IP address of your Kafka Connect server. -
Check Network Connectivity: Make sure that the machine running the DataHub ingestion pipeline can reach the Kafka Connect server. You can test this by using tools like
telnet
ornc
to check connectivity to the Kafka Connect port (8083).telnet bootstrap_server_url 8083
-
Firewall and Security Groups: Check if there are any firewall rules or security groups that might be blocking the connection to the Kafka Connect server. Ensure that port 8083 is open and accessible.
-
Kafka Connect Server Health: Verify that the Kafka Connect server is running and healthy. You can do this by accessing the Kafka Connect REST API directly in a web browser or using a tool like
curl
:curl http://bootstrap_server_url:8083/
-
Configuration in DataHub Recipe: Ensure that your DataHub ingestion recipe is correctly configured to use the Kafka Connect source. The configuration should include the correct
connect_uri
pointing to your Kafka Connect server.source: type: kafka-connect config: connect_uri: "http://bootstrap_server_url:8083"
-
Logs and Error Messages: Check the logs of your Kafka Connect server for any error messages or warnings that might provide more context about the issue.
If you need more detailed guidance on any of these steps, please let me know.
Sources:
- <https://datahubproject.io/docs/0.13.1/ui-ingestion#debugging-a-failed-ingestion-run|DataHub Documentation on Debugging Ingestion>
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/kafka_connect/kafka_connect.py|Kafka Connect Source Configuration>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi <@U02TH1AS8S2>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.
<@U06TM7M40RK>, I have checked my recipe for connect and everything looks correct as mentioned in your suggestions. I am still seeing same error.
If you’ve verified that your Kafka Connect configuration and network settings are correct, but you’re still encountering the connection timeout error, here are a few additional steps you can take to troubleshoot the issue:
-
Check DNS Resolution: Ensure that the hostname
bootstrap_server_url
can be resolved to an IP address from the machine where the DataHub ingestion pipeline is running. You can test this by running a DNS lookup or ping command:nslookup bootstrap_server_url ping bootstrap_server_url
-
Use IP Address: If DNS resolution is an issue, try using the IP address of the Kafka Connect server instead of the hostname in your configuration.
-
Proxy Settings: If your environment uses a proxy, ensure that the proxy settings are correctly configured to allow traffic to the Kafka Connect server.
-
SSL/TLS Configuration: If your Kafka Connect server is configured to use SSL/TLS, ensure that your DataHub ingestion configuration includes the necessary SSL settings. This might include specifying the SSL protocol, certificate locations, and any required authentication details.
-
Increase Timeout Settings: If network latency is high, consider increasing the connection timeout settings in your DataHub ingestion configuration.
-
Check Kafka Connect Logs: Review the logs of your Kafka Connect server for any errors or warnings that might indicate why the connection is failing. This can provide more context on what might be going wrong.
-
Test with a Simple Client: Try connecting to the Kafka Connect server using a simple HTTP client (e.g.,
curl
) from the same machine where the DataHub ingestion pipeline is running. This can help determine if the issue is specific to the DataHub configuration or a more general connectivity problem.curl http://bootstrap_server_url:8083/
If these steps do not resolve the issue, please let me know, and we can explore further troubleshooting options.
Sources:
- <https://datahubproject.io/docs/0.13.1/ui-ingestion#debugging-a-failed-ingestion-run|DataHub Documentation on Debugging Ingestion>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U07FV3UHE4D> Has any one used Confluent managed Kafka and tried ingesting Kafka-connect ?
<@U02TH1AS8S2> I haven’t used Kafka Connect, but do have some experience with Kafka. The errors you get are often very generic. I would suggest checking network and firewall again if you can