Troubleshooting DataHub Ingestion Issues

Original Slack Thread

This message was deleted.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses! We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic and to help us improve the assistant!

To help diagnose the issue with your ingestion, I need a bit more information:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery, PostgreSQL, etc.)?
  4. Can you provide the specific error message you are encountering?

Based on the information provided, here are some general troubleshooting steps you can take:

  1. Check Logs: Look at the detailed logs for more specific error messages. This can provide more context about what might be going wrong. You can access these logs through the DataHub UI by clicking on the ingestion run details or by using docker logs for the relevant container.

  2. Verify Configuration: Ensure that your ingestion recipe and configurations are correct. Sometimes, a misconfiguration can lead to such errors. For example, make sure that the host_port, username, password, and other required fields are correctly set.

  3. Network Connectivity: Make sure that the DataHub instance can reach the data source and that there are no network issues. Verify DNS resolution and network policies that might be blocking the connection.

  4. Container Health: Ensure that all necessary Docker containers are running and healthy. You can verify the health of the containers using docker ps. Specifically, check the datahub-actions container, which is responsible for executing ingestion requests.

  5. Resource Allocation: Ensure that your Docker environment has sufficient resources allocated to the DataHub components.

For more detailed steps on debugging a failed ingestion run, you can refer to the <Ingestion | DataHub documentation on Debugging Ingestion>.

Would you like more detailed steps or further assistance on any specific part of the process?

Sources:

  • <Ingestion | DataHub documentation on Debugging Ingestion> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.