Issues Connecting to Schema Registry in DataHub Kafka Topic

Original Slack Thread

Hi,
I trying to connect to the Kafka topic in DataHub but i am having issues connecting to the schema registry.
I am running v0.13.0 on EKS.
I want to consume the Kafka topic with Apache NiFi, i managed to connect to kafka and fetch msgs, but i am not able to deserialize them.
How can i use the GMS internal schema reg for to deserialize the msgs ?
And no the action module does not serve for my use case.
Thx

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

Sure, I can help with that! To assist you better, could you please provide the following information:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (You mentioned v0.13.0, please confirm)
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery, Kafka, etc.)

Additionally, could you provide more details about the issues you’re facing with connecting to the schema registry? For example, any specific error messages or logs would be helpful.

Based on the information you provided so far, here are some general steps and considerations for connecting to a Kafka topic and schema registry in DataHub:

General Steps for Kafka Ingestion

  1. Kafka Configuration:

  2. Schema Registry Configuration:

  3. EKS Specific Configuration:

Example Configuration

Here is an example configuration for connecting to a Kafka topic and schema registry:

source:
  type: "kafka"
  config:
    connection:
      bootstrap: "your-kafka-bootstrap-server:9092"
      consumer_config:
        security.protocol: "SASL_SSL"
        sasl.mechanism: "PLAIN"
        sasl.username: "${CLUSTER_API_KEY_ID}"
        sasl.password: "${CLUSTER_API_KEY_SECRET}"
      schema_registry_url: "https://your-schema-registry-url"
      schema_registry_config:
        basic.auth.user.info: "${REGISTRY_API_KEY_ID}:${REGISTRY_API_KEY_SECRET}"

If you are using a custom schema registry, you might need to implement the KafkaSchemaRegistryBase class as shown in the https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/kafka/kafka.md#custom-schema-registry|documentation.

Would you like more detailed steps or have specific questions about any part of the setup?

Sources:

i want to connect to the internal gms schema registry from apache nifi

When configured to run with the internal schema registry, this is hosted on GMS at this URL http://&lt;GMS Hostname&gt;:8080/schema-registry/api⁣ make sure that your apache nifi can reach this port and host.