<@U06TM7M40RK> Can I ingest the same table twice into Datahub and if so how do I identify one table from another?
It seems likely that ingesting the same table twice into DataHub can lead to issues if the tables have the same URN, as this can cause metadata to be overridden. This is because DataHub uses URNs to uniquely identify datasets, and if two tables have the same URN, they will be treated as the same dataset.
To distinguish between two tables with the same name but from different sources, you can use platform instances. According to <https://forum.datahubproject.io/t/how-to-distinguish-ingestion-sources-with-the-same-data-platform-and-tables-but-different-host-and-port/454|this discussion>, platform instances can be used to differentiate datasets that have the same name but come from different environments or configurations. This involves specifying a unique platform instance for each source in your ingestion configuration.
If you are using the CLI or a recipe for ingestion, you can specify the platform instance in your ingestion recipe to ensure that each table is uniquely identified by its source. This is particularly useful if you have the same table name in different databases or environments.
Would you like more detailed steps on how to configure platform instances in your ingestion recipe?
Sources:
- <https://forum.datahubproject.io/t/how-to-distinguish-ingestion-sources-with-the-same-data-platform-and-tables-but-different-host-and-port/454|this discussion> 0 button 0 button Hint: Mention @runllm in the thread for followups.
Hi <@U07HF0U8716>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.