Troubleshooting Ingestion Job Stuck in Pending State on AWS EKS with SSL SASL Protocol

Original Slack Thread

I have installed datahub on aws using EKS. All pods are running but I have an issue with Ingestion job in UI as it got stuck at pending state. When I checked the logs of acryl pod I have this error : %6|1729798575.841|FAIL|rdkafka#consumer-2| [thrd:http://b-3.mymskcluster2.p1ph1i.c2.kafka.eu-west-2.amazonaws.com:9096/|b-3.mymskcluster2.p1ph1i.c2.kafka.eu-west-2.amazonaws.com:9096/]: http://b-3.mymskcluster2.p1ph1i.c2.kafka.eu-west-2.amazonaws.com:9096/bootstrap|b-3.mymskcluster2.p1ph1i.c2.kafka.eu-west-2.amazonaws.com:9096/bootstrap: Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 0ms in state APIVERSION_QUERY, 4 identical error(s) suppressed) . Anybody managed to solve this ? I am using MSK and SSL_SASL protocol. Kafka-setup pod was completed successfully, I checked logs and it managed to connect to kafka using my config. Here is part of my values.yaml fileattachment

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

The error message you’re encountering, Disconnected while requesting ApiVersion, typically indicates an issue with the security protocol configuration when connecting to a Kafka broker. This is a common issue when using SSL or SASL_SSL protocols with Kafka, especially in AWS MSK environments.

Here are some steps and considerations to help resolve the issue:

  1. Verify Security Protocol Configuration:
    Ensure that the security protocol is correctly set in your configuration. For AWS MSK with SSL/SASL, the configuration should look something like this:

    consumer_config:
      security.protocol: SASL_SSL
      sasl.mechanism: AWS_MSK_IAM
      sasl.jaas.config: software.amazon.msk.auth.iam.IAMLoginModule required;
      sasl.client.callback.handler.class: software.amazon.msk.auth.iam.IAMClientCallbackHandler
    

    Make sure that these configurations are correctly set in your values.yaml file.

  2. Check IAM Role and Permissions:
    Ensure that the IAM role associated with your EKS pods has the necessary permissions to access the MSK cluster. The role should have policies that allow it to connect to the MSK brokers.

  3. Network Connectivity:
    Verify that your EKS cluster can reach the MSK brokers. This includes checking security groups, VPC settings, and any network ACLs that might be blocking traffic.

  4. Broker Version Compatibility:
    Double-check that your Kafka broker version is compatible with the client version you are using. The error message suggests checking if the broker version is less than 0.10, which is unlikely with MSK, but it’s worth verifying.

  5. Logs and Debugging:
    Review the detailed logs from the DataHub ingestion job and the Kafka broker for any additional error messages or warnings that might provide more context.

  6. Example Configuration:
    Here is an example configuration for connecting to AWS MSK using IAM authentication:

    source:
      type: "kafka"
      config:
        connection:
          bootstrap: "<your_kafka_bootstrap_server>"
          consumer_config:
            security.protocol: "SASL_SSL"
            sasl.mechanism: "AWS_MSK_IAM"
            sasl.jaas.config: "software.amazon.msk.auth.iam.IAMLoginModule required;"
            sasl.client.callback.handler.class: "software.amazon.msk.auth.iam.IAMClientCallbackHandler"
    

Would you like more detailed step-by-step guidance on any of these points?

Sources:

Thank you for your response. However, I am not using AWS_MSK_IAM mechanism but SCRAM-SHA-512 and SASL_SSL security protocol. I have tested it and it works together with my credentials and kafka-setup pod managed to connect to the brokers. All topics were created successfully. It’s only acryl pod that throws the error

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><

I still need assistance <@U06TM7M40RK>

Given that you are using the SCRAM-SHA-512 mechanism with the SASL_SSL security protocol, and the Kafka setup pod was able to connect successfully, the issue might be specific to the configuration of the acryl pod. Here are some steps to troubleshoot and resolve the issue:

  1. Verify SASL Configuration:
    Ensure that the SASL configuration is correctly set in your acryl pod’s configuration. The configuration should include the sasl.jaas.config property with the correct credentials. Here is an example configuration:

    consumer_config:
      security.protocol: SASL_SSL
      sasl.mechanism: SCRAM-SHA-512
      sasl.jaas.config: org.apache.kafka.common.security.scram.ScramLoginModule required username="your-username" password="your-password";
    
  2. Check Environment Variables:
    Make sure that the environment variables for Kafka configuration are correctly set in the acryl pod. This includes variables like KAFKA_BOOTSTRAP_SERVER, KAFKA_PROPERTIES_SASL_MECHANISM, and KAFKA_PROPERTIES_SECURITY_PROTOCOL.

  3. Network and Security Groups:
    Double-check the network settings and security groups to ensure that the acryl pod has the necessary permissions and network access to connect to the Kafka brokers.

  4. Logs and Debugging:
    Review the logs from the acryl pod for any additional error messages or warnings that might provide more context. This can help identify if there are any specific issues with the connection or authentication.

  5. Example Configuration:
    Here is an example configuration for connecting to Kafka using SCRAM-SHA-512:

    source:
      type: "kafka"
      config:
        connection:
          bootstrap: "<your_kafka_bootstrap_server>"
          consumer_config:
            security.protocol: "SASL_SSL"
            sasl.mechanism: "SCRAM-SHA-512"
            sasl.jaas.config: "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"your-username\" password=\"your-password\";"
    

Would you like more detailed step-by-step guidance on any of these points?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Just checking in a final time, Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look at this thread :slight_smile:

Hope this helps ! ><