Troubleshooting Kafka Topic Unavailability in DataHub Configuration

Original Slack Thread

Hi, I am trying to run a datahub “hello world” action which looks like this:

Pipeline name

name: “hello_world”

1. Event Source: where to source event from.

source:
type: “kafka”
config:
connection:
bootstrap: localhost:9092
schema_registry_url: https://localhost:8081

2. Action: what action to take on events.

action:
type: “hello_world”

When I run datahub actions -c hello_world.yaml, I get the following error:
Kafka consume error: KafkaError{code=UNKNOWN_TOPIC_OR_PART,val=3,str=“Subscribed topic not available: MetadataChangeLog_Versioned_v1: Broker: Unknown topic or partition”}
Kafka consume error: KafkaError{code=UNKNOWN_TOPIC_OR_PART,val=3,str=“Subscribed topic not available: PlatformEvent_v1: Broker: Unknown topic or partition”}

I am not sure why these two topics are unavailable. Does anyone know what’s happening or what’s causing it? Thanks

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

Hey Poojan, Could you please check if that topic exists in your kafka cluster PlatformEventv1 ?

I checked the list of topics there are, as you can see below both PlatformEvent_v1 and MetadataChangeLog_Versioned_v1 exist

/kafka-topics.sh --list --bootstrap-server localhost:9092
DataHubUpgradeHistory_v1
DataHubUsageEvent_v1
FailedMetadataChangeEvent_v4
FailedMetadataChangeProposal_v1
MetadataAuditEvent_v4
MetadataChangeEvent_v4
MetadataChangeLog_Timeseries_v1
MetadataChangeLog_Timeseries_v1.json
MetadataChangeLog_Versioned_v1
MetadataChangeProposal_v1
PlatformEvent_v1
__consumer_offsets
_schemas

Is there any other way to check? Thanks

Hi Poojan,

Can you check that your DataHub configuration (hello_world.yaml) is correctly pointing to the Kafka instance where these topics exist.

Ensure that the Kafka instance you’re querying with kafka-topics.sh is the same one your DataHub instance is configured to use.

If offsets have been reset or if you want to start consuming from the beginning, you can reset the consumer group’s offsets using the kafka-consumer-groups.sh tool.

For example, to reset the offsets of the consumer group hello_world to the earliest offset for both topics:

/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group --reset-offsets --to-earliest --execute --topic MetadataChangeLog_Versionedv1

/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group --reset-offsets --to-earliest --execute --topic PlatformEvent_v1

Hi <@U06DLBKN7HV>,

Thanks for getting back. I am not sure if my hello_world.yaml is pointing to the right kafka instance. I am using the same boot-strap server in the recipe file and the pod shell.

When I specify the group to be hello_world, it says that the server does not host this topic-partition. Also, when I list the consumer groups, hello_world doesn’t come up. Is it supposed to?