Using multiple `connect_uris` in the same recipe for Kafka Connect

Original Slack Thread

using multiple connect_uris in the same recipe <@U06TM7M40RK> for kafka-connect

To use multiple connect_uris in the same recipe for Kafka Connect, you can specify them in the configuration file. However, the data sources provided do not explicitly mention the use of multiple connect_uris in a single recipe. The configuration examples provided in the data sources focus on other aspects such as platform instances and bootstrap servers.

If you are trying to configure multiple Kafka Connect URIs, you might consider specifying them in a list format or using a similar approach as with multiple bootstrap servers. However, since this specific configuration is not detailed in the provided data sources, I recommend checking the official Kafka Connect documentation or reaching out to the DataHub community for more precise guidance.

Would you like more detailed step-by-step guidance on configuring Kafka Connect with multiple URIs, or is there another aspect of the configuration you need help with?

Sources:

<@U06TM7M40RK> I have dynamodb tables ingested and kafka-connect for dynamo tables. How does the lineage mapping works?

To understand how lineage mapping works for DynamoDB tables ingested via Kafka Connect in DataHub, we need to consider the integration and configuration of Kafka Connect with DataHub. Here’s a summary based on the available data sources:

  1. Kafka Connect Integration: Kafka Connect can be used to capture lineage information between DynamoDB tables and other systems. This integration extracts source and sink connectors as data pipelines and represents lineage information between DynamoDB tables and other datasets. The lineage is typically represented as data jobs that show the flow of data from source datasets (DynamoDB tables) to Kafka topics and then to sink datasets.

  2. Lineage Representation: In DataHub, lineage is represented using URNs (Uniform Resource Names) that uniquely identify datasets, data jobs, and other entities. When you configure Kafka Connect to ingest data from DynamoDB tables, it creates lineage edges that map the flow of data from the source (DynamoDB) to the destination (Kafka topics or other systems).

  3. Configuration: To set up lineage mapping, you need to configure your Kafka Connect recipe to include the necessary platform instances and mappings. This involves specifying the platform instance for DynamoDB and any other systems involved in the data flow. The configuration can be done using the platform_instance_map or connect_to_platform_map in your Kafka Connect recipe.

  4. Example Configuration: While the data sources do not provide a specific example for DynamoDB, the general approach involves defining the source and sink connectors in your Kafka Connect configuration and ensuring that the lineage information is correctly emitted to DataHub.

If you need more detailed guidance on setting up the configuration or have specific questions about the process, please let me know!

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><