Troubleshooting DataHub GMS Service Failure with METADATA_SERVICE_AUTH_ENABLED Implementation

Original Slack Thread

Hi All,

I set the METADATA_SERVICE_AUTH_ENABLED environment variable to true for both datahub-gms and datahub-frontend. However, when I restart the datahub-gms service to apply the new configuration, the service fails to stay up and I see the following error logs:
2024-12-27 09:32:03,352 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.ConsumerCoordinator:294 - [Consumer clientId=consumer-datahub-usage-event-consumer-job-client-4, groupId=datahub-usage-event-consumer-job-client] Revoke previously assigned partitions DataHubUsageEvent_v1-0
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,352 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.ConsumerCoordinator:294 - [Consumer clientId=consumer-datahub-usage-event-consumer-job-client-4, groupId=datahub-usage-event-consumer-job-client] Revoke previously assigned partitions DataHubUsageEvent_v1-0
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,342 [ThreadPoolTaskExecutor-1] INFO o.s.k.l.KafkaMessageListenerContainer:292 - mce-consumer-job-client: Consumer stopped
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,342 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator:979 - [Consumer clientId=consumer-generic-mce-consumer-job-client-3, groupId=generic-mce-consumer-job-client] Member consumer-generic-mce-consumer-job-client-3-31b720de-b8c9-4cf4-8a04-fdae6df99a87 sending LeaveGroup request to coordinator http://b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096|b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096 (id: 2147483645 rack: null) due to the consumer unsubscribed from all topics
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,342 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator:979 - [Consumer clientId=consumer-generic-mce-consumer-job-client-3, groupId=generic-mce-consumer-job-client] Member consumer-generic-mce-consumer-job-client-3-31b720de-b8c9-4cf4-8a04-fdae6df99a87 sending LeaveGroup request to coordinator http://b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096|b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096 (id: 2147483645 rack: null) due to the consumer unsubscribed from all topics
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,343 [ThreadPoolTaskExecutor-1] INFO o.s.k.c.DefaultKafkaConsumerFactory$ExtendedKafkaConsumer:1082 - [Consumer clientId=consumer-generic-mce-consumer-job-client-3, groupId=generic-mce-consumer-job-client] Unsubscribed all topics or patterns and assigned partitions
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,343 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator:979 - [Consumer clientId=consumer-generic-platform-event-job-client-5, groupId=generic-platform-event-job-client] Member consumer-generic-platform-event-job-client-5-2ea45993-3531-4cad-823d-fc1e055e1888 sending LeaveGroup request to coordinator http://b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096|b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096 (id: 2147483645 rack: null) due to the consumer unsubscribed from all topics
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,343 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator:979 - [Consumer clientId=consumer-generic-platform-event-job-client-5, groupId=generic-platform-event-job-client] Member consumer-generic-platform-event-job-client-5-2ea45993-3531-4cad-823d-fc1e055e1888 sending LeaveGroup request to coordinator http://b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096|b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096 (id: 2147483645 rack: null) due to the consumer unsubscribed from all topics
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,344 [ThreadPoolTaskExecutor-1] INFO o.s.k.c.DefaultKafkaConsumerFactory$ExtendedKafkaConsumer:1082 - [Consumer clientId=consumer-generic-platform-event-job-client-5, groupId=generic-platform-event-job-client] Unsubscribed all topics or patterns and assigned partitions
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,346 [ThreadPoolTaskExecutor-1] INFO o.s.k.l.KafkaMessageListenerContainer:292 - generic-platform-event-job-client: Consumer stopped
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,335 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator:979 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Member consumer-mce-consumer-job-client-1-a467e443-1810-44b4-aa96-aa09c9f82217 sending LeaveGroup request to coordinator http://b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096|b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096 (id: 2147483645 rack: null) due to the consumer unsubscribed from all topics
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,335 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator:979 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Member consumer-mce-consumer-job-client-1-a467e443-1810-44b4-aa96-aa09c9f82217 sending LeaveGroup request to coordinator http://b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096|b-2.its4udatahubinframsk.ekch5n.c23.kafka.us-east-1.amazonaws.com:9096 (id: 2147483645 rack: null) due to the consumer unsubscribed from all topics
datahub-gms
December 27, 2024 at 06:32 (UTC-3:00)
2024-12-27 09:32:03,336 [ThreadPoolTaskExecutor-1] INFO o.s.k.c.DefaultKafkaConsumerFactory$ExtendedKafkaConsumer:1082 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Unsubscribed all topics or patterns and assigned partitions

Logs - Error I found
December 27, 2024 at 14:13 (UTC-3:00)
panic: interface conversion: error is *os.SyscallError, not *exec.ExitError
datahub-gms
December 27, 2024 at 14:12 (UTC-3:00)
2024-12-27 17:12:40,568 [pool-6-thread-5] WARN c.l.m.b.s.RemoveClientIdAspectStep:46 - Error when running the RemoveUnknownAspects Bootstrap Step
datahub-gms
December 27, 2024 at 14:12 (UTC-3:00)
key.deserializer = class org.springframework.kafka.support.serializer.ErrorHandlingDeserializer
datahub-gms
December 27, 2024 at 14:12 (UTC-3:00)
value.deserializer = class org.springframework.kafka.support.serializer.ErrorHandlingDeserializer
datahub-gms
December 27, 2024 at 14:12 (UTC-3:00)
key.deserializer = class org.springframework.kafka.support.serializer.ErrorHandlingDeserializer
The service doesn’t stay up, and it seems like there may be issues related to Kafka deserialization and some system call errors.

Has anyone experienced this issue or know how to resolve it?
Obs: I use MSK AWS
Thanks in advance!

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To address the issues you’re experiencing with the DataHub GMS service not staying up after enabling METADATA_SERVICE_AUTH_ENABLED, let’s break down the potential causes and solutions based on the error logs and context provided:

  1. Kafka Consumer Issues:

    • The logs indicate that Kafka consumers are unsubscribing from all topics, which might be related to configuration issues or connectivity problems with your Kafka setup (MSK AWS).
    • Ensure that your Kafka configuration is correct, including the bootstrap.servers, security.protocol, and any authentication settings required for MSK. Verify that the Kafka topics are accessible and that the consumer group IDs are correctly configured.
  2. Deserialization Errors:

    • The use of ErrorHandlingDeserializer for both key and value deserializers is recommended to handle deserialization errors gracefully. Ensure that the deserializers are correctly configured in your Kafka consumer properties.
    • Check if there are any schema compatibility issues with the Avro messages being consumed. Ensure that the schema registry (if used) is correctly configured and accessible.
  3. System Call Errors:

    • The error panic: interface conversion: error is *os.SyscallError, not *exec.ExitError suggests a potential issue with system calls or environment configuration.
    • Verify that the environment variables and system properties are correctly set for the DataHub GMS service. Ensure that any required dependencies or configurations are in place.
  4. Authentication and Authorization:

    • Since you enabled METADATA_SERVICE_AUTH_ENABLED, ensure that the authentication and authorization configurations are correctly set up. This includes any required credentials, tokens, or access control settings.
  5. General Troubleshooting Steps:

    • Check the GMS logs for any additional error messages or warnings that might provide more context.
    • Ensure that the GMS service has sufficient resources (CPU, memory) allocated.
    • Verify network connectivity between the GMS service and the Kafka brokers.

Would you like more detailed steps on any of these points, or do you have additional information about your current setup that might help narrow down the issue?

Sources:

Hi, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.