Hi,
I trying to connect to the Kafka topic in DataHub but i am having issues connecting to the schema registry.
I am running v0.13.0 on EKS.
I want to consume the Kafka topic with Apache NiFi, i managed to connect to kafka and fetch msgs, but i am not able to deserialize them.
How can i use the GMS internal schema reg for to deserialize the msgs ?
And no the action module does not serve for my use case.
Thx
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
Sure, I can help with that! To assist you better, could you please provide the following information:
- Are you using the UI or CLI for ingestion?
- Which DataHub version are you using? (You mentioned v0.13.0, please confirm)
- What data source(s) are you integrating with DataHub? (e.g., BigQuery, Kafka, etc.)
Additionally, could you provide more details about the issues you’re facing with connecting to the schema registry? For example, any specific error messages or logs would be helpful.
Based on the information you provided so far, here are some general steps and considerations for connecting to a Kafka topic and schema registry in DataHub:
General Steps for Kafka Ingestion
-
Kafka Configuration:
- Ensure that your Kafka broker and schema registry are correctly configured in your DataHub setup. You can refer to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/kafka/kafka.md|Kafka ingestion documentation> for detailed configuration options.
-
Schema Registry Configuration:
- If you are using a custom schema registry or a non-default schema type, you may need to provide a custom implementation of the
KafkaSchemaRegistryBase
class. Refer to the <datahub/metadata-ingestion/docs/sources/kafka/kafka.md at master · datahub-project/datahub · GitHub Schema Registry section> for more details.
- If you are using a custom schema registry or a non-default schema type, you may need to provide a custom implementation of the
-
EKS Specific Configuration:
- Ensure that your EKS worker nodes have the correct permissions to access the schema registry. You can refer to the <datahub/docs/deploy/aws.md at master · datahub-project/datahub · GitHub Glue Schema Registry section> for more details on setting up permissions.
Example Configuration
Here is an example configuration for connecting to a Kafka topic and schema registry:
source:
type: "kafka"
config:
connection:
bootstrap: "your-kafka-bootstrap-server:9092"
consumer_config:
security.protocol: "SASL_SSL"
sasl.mechanism: "PLAIN"
sasl.username: "${CLUSTER_API_KEY_ID}"
sasl.password: "${CLUSTER_API_KEY_SECRET}"
schema_registry_url: "https://your-schema-registry-url"
schema_registry_config:
basic.auth.user.info: "${REGISTRY_API_KEY_ID}:${REGISTRY_API_KEY_SECRET}"
If you are using a custom schema registry, you might need to implement the KafkaSchemaRegistryBase
class as shown in the https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/kafka/kafka.md#custom-schema-registry|documentation.
Would you like more detailed steps or have specific questions about any part of the setup?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/kafka/kafka.md|Kafka Ingestion Documentation>
- <datahub/docs/deploy/aws.md at master · datahub-project/datahub · GitHub Glue Schema Registry>
- <datahub/metadata-ingestion/docs/sources/kafka/kafka.md at master · datahub-project/datahub · GitHub Schema Registry section> 0 button 0 button Hint: Mention @runllm in the thread for followups.
i want to connect to the internal gms schema registry from apache nifi
When configured to run with the internal schema registry, this is hosted on GMS at this URL http://<GMS Hostname>:8080/schema-registry/api
make sure that your apache nifi can reach this port and host.