<@U06TM7M40RK> give sample receipe.yaml for created confluent kafka as an ingestion source
To create a recipe for ingesting metadata from Confluent Kafka into DataHub, you can use the following YAML configuration. This example demonstrates how to configure the Kafka source with Confluent Cloud, including the necessary authentication details for both the Kafka broker and the schema registry.
source:
type: "kafka"
config:
platform_instance: "YOUR_CLUSTER_ID"
connection:
bootstrap: "abc-defg.eu-west-1.aws.confluent.cloud:9092"
consumer_config:
security.protocol: "SASL_SSL"
sasl.mechanism: "PLAIN"
sasl.username: "${CLUSTER_API_KEY_ID}"
sasl.password: "${CLUSTER_API_KEY_SECRET}"
schema_registry_url: "https://abc-defgh.us-east-2.aws.confluent.cloud"
schema_registry_config:
basic.auth.user.info: "${REGISTRY_API_KEY_ID}:${REGISTRY_API_KEY_SECRET}"
topic_patterns:
allow: ['.*'] # Adjust this to specify which topics to include
deny: ['_confluent.*'] # Adjust this to specify which topics to exclude
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080" # Replace with your DataHub GMS server URL
token: "<Your DataHub API Token>" # Replace with your DataHub API token if required
Key Points:
- Bootstrap Server: The
bootstrap
field specifies the Kafka broker’s address. - Consumer Config: Includes security protocol and SASL authentication details.
- Schema Registry: The
schema_registry_url
andschema_registry_config
fields are used to connect to the Confluent schema registry. - Topic Patterns: Use
allow
anddeny
to filter which topics to ingest. - Sink Configuration: The
datahub-rest
sink is configured to send metadata to your DataHub instance.
This configuration is based on the guidelines provided in the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/kafka#connecting-to-confluent-cloud|DataHub Kafka Source Documentation> and <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/kafka/kafka.md|GitHub documentation>.
Would you like more detailed steps or additional information on any specific part of the process?
Sources:
- <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/kafka#connecting-to-confluent-cloud|DataHub Kafka Source Documentation>
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/kafka/kafka.md|GitHub documentation>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> below is the yaml we got from github
type: "kafka"
config:
platform_instance: "YOUR_CLUSTER_ID"
connection:
bootstrap: "abc-defg.eu-west-1.aws.confluent.cloud:9092"
consumer_config:
security.protocol: "SASL_SSL"
sasl.mechanism: "PLAIN"
sasl.username: "${CLUSTER_API_KEY_ID}"
sasl.password: "${CLUSTER_API_KEY_SECRET}"
schema_registry_url: "<https://abc-defgh.us-east-2.aws.confluent.cloud>"
schema_registry_config:
basic.auth.user.info: "${REGISTRY_API_KEY_ID}:${REGISTRY_API_KEY_SECRET}"
sink:
# sink configs```
However in our case we don have schema registry url. What will be yaml file for us
<@U06TM7M40RK> below is the yaml we got from github
type: "kafka"
config:
platform_instance: "YOUR_CLUSTER_ID"
connection:
bootstrap: "abc-defg.eu-west-1.aws.confluent.cloud:9092"
consumer_config:
security.protocol: "SASL_SSL"
sasl.mechanism: "PLAIN"
sasl.username: "${CLUSTER_API_KEY_ID}"
sasl.password: "${CLUSTER_API_KEY_SECRET}"
schema_registry_url: "<https://abc-defgh.us-east-2.aws.confluent.cloud>"
schema_registry_config:
<http://basic.auth.user.info|basic.auth.user.info>: "${REGISTRY_API_KEY_ID}:${REGISTRY_API_KEY_SECRET}"
sink:
# sink configs```
However in our case we don have schema registry url. What will be yaml file for us (edited)
It seems likely that if you do not have a schema registry URL, you can modify the YAML configuration to exclude the schema registry settings. This is based on the understanding that the schema registry is used to manage schemas for Kafka topics, and if you are not using it, you can omit those parts of the configuration.
Here’s how you might adjust your YAML configuration:
source:
type: "kafka"
config:
platform_instance: "YOUR_CLUSTER_ID"
connection:
bootstrap: "abc-defg.eu-west-1.aws.confluent.cloud:9092"
consumer_config:
security.protocol: "SASL_SSL"
sasl.mechanism: "PLAIN"
sasl.username: "${CLUSTER_API_KEY_ID}"
sasl.password: "${CLUSTER_API_KEY_SECRET}"
topic_patterns:
allow: ['.*'] # Adjust this to specify which topics to include
deny: ['_confluent.*'] # Adjust this to specify which topics to exclude
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080" # Replace with your DataHub GMS server URL
token: "<Your DataHub API Token>" # Replace with your DataHub API token if required
Key Adjustments:
- Schema Registry: Removed
schema_registry_url
andschema_registry_config
since you do not have a schema registry. - Topic Patterns: Ensure you specify which topics to include or exclude using
allow
anddeny
.
This configuration assumes that your Kafka topics do not require schema management through a registry. If your topics are using JSON or another format that does not require a schema registry, this setup should work.
If you need further customization or have specific requirements, please let me know!
Sources:
- https://datahubspace.slack.com/archives/CV2UVAPPG/p1690545008.697309|https://datahubspace.slack.com/archives/CV2UVAPPG/p1690545008.697309
- https://github.com/datahub-project/datahub/issues/5706|https://github.com/datahub-project/datahub/issues/5706
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><