Troubleshooting Stuck DataHub Ingestions from Terminal

Original Slack Thread

How to stop stuck ingestions running indefinitely on datahub from terminal

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To stop stuck ingestions running indefinitely on DataHub from the terminal, you can follow these steps:

  1. Identify the Ingestion Run: First, you need to identify the specific ingestion run that is stuck. You can list all the ingestion runs using the datahub ingest list-runs command.

    datahub ingest list-runs
    
  2. Cancel the Ingestion Run: Once you have identified the run ID of the stuck ingestion, you can cancel it using the datahub ingest cancel command. Note that this command is not explicitly mentioned in the provided data sources, but you can manually stop the process as described below.

  3. Manually Stop the Process: If the ingestion run is not responding to cancellation commands, you can manually stop the process by identifying the process ID (PID) and killing it. This can be done using Docker commands if you are running DataHub in Docker.

    # List all running containers to find the datahub-actions container
    docker ps | grep datahub-actions
    
    # Get the container ID of the datahub-actions container
    CONTAINER_ID=$(docker ps | grep datahub-actions | awk '{print $1}')
    
    # List all processes running inside the datahub-actions container
    docker exec -it $CONTAINER_ID ps -aux
    
    # Identify the PID of the stuck ingestion process and kill it
    docker exec -it $CONTAINER_ID kill -9 <PID>
    
  4. Check Logs for Issues: After stopping the stuck ingestion, it is important to check the logs to understand why the ingestion got stuck. You can inspect the logs of the datahub-actions container.

    docker logs $CONTAINER_ID
    
  5. Verify Resource Allocation: Ensure that your Docker environment has sufficient CPU and memory allocated to avoid future issues.

    docker stats $CONTAINER_ID
    
  6. Check Disk Space: Verify that there is enough disk space available, as disk space issues can cause ingestion jobs to fail or get stuck.

    docker exec -it $CONTAINER_ID df -h
    

Would you like more detailed steps on any of these troubleshooting points or further assistance on a specific part of the process?

Sources:

  • <Slack Conversation on Ingestion Issues>
  • <Ingestion | DataHub Ingestion Documentation> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.