Troubleshooting Kafka Replication Factor & Data Sync Issues in Datahub 0.12.0

Original Slack Thread

hello we are using Datahub version 0.12.0 for the kafka pre-requisites we are using kafka cluster with three brokers and we are using replication factor 3 and partition count as 1 .By default as per the helm chart deployment replication factor is 1 when we have updated the replication factor to 3 we are seeing issues with the data sync between database and elastic index .Should we revert the replication factor and what are the recommended kafka broker count ,replication factor and partition count

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

we are using Datahub version 0.12.0 ,we are not seeing errors in the logs but we the users newly added to datahub not showing up in UI

How are you adding users to Datahub? Is new metadata ingest work as normal, will changes/updates reflects in datahub? what makes you think increasing replication factor 3 cause the problem?
most probably Kafka may not be problem here, For Datahub with three broker and replication=3 should sufficient enough for most use case. After increasing replication factor 3 make sure Kafka is working fine, check Kafka logs .

<@U0445MUD81W> thanks for the information provided .I have verified the kafka logs and everything seems to be working as expected . From my troubleshooting I found there is kafka lag for MetadataChangeLog_Versioned_v1 topic when i check the offset

generic-mae-consumer-job-client MetadataChangeLog_Timeseries_v1 0          335162          335162          0               consumer-generic-mae-consumer-job-client-5-cf59af31-91af-462e-b2e8-17af63a6eb1a /  consumer-generic-mae-consumer-job-client-5
generic-mae-consumer-job-client MetadataChangeLog_Versioned_v1  0          1930907         2262728         331821          consumer-generic-mae-consumer-job-client-5-cf59af31-91af-462e-b2e8-17af63a6eb1a /  consumer-generic-mae-consumer-job-client-5```

it looks like issue with the mae consumer it is failing to process messages and commit offests.
Are you running with standalone consumer or embedded consumer within GMS ? check logs GMS/MAE Consumer If any INFO anf ERROS are happening on the message processing
In GMS mae consumer logs line will start something like this
2024-01-22 14:12:05,756 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-generic-mae-consumer-job-client-5, groupId=generic-mae-consumer-job-client]. ...

We are running as embedded consumers we have disabled mae and mce