Including the Catalog Name in Spark Emitted Table Names in Databricks

user-3 · October 28, 2024, 12:03am

When using the Spark listener in Databricks, is it possible to include the catalog name as part of the emitted table name?

datahub_team · October 28, 2024, 12:03am

Hey there!

<@U06TM7M40RK> Can you help with the questions in this thread?

Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

user-2 · October 28, 2024, 12:03am

To include the catalog name as part of the emitted table name when using the Spark listener in Databricks, you need to configure the Spark lineage emitter correctly. According to the <Slack Slack discussion>, the user had to concatenate the platform instance value with the catalog name in the spark.datahub.metadata.dataset.platformInstance configuration to ensure that the catalog name is included in the dataset URN. This workaround was necessary because, without it, the upstream and downstream table URNs would miss the catalog component.

However, this approach may not be scalable if a job involves multiple tasks handling Unity Catalog tables from different catalog instances. Unfortunately, the data sources do not provide a more scalable solution or a direct configuration option to automatically include the catalog name without this manual concatenation.

Would you like more detailed information on how to set up this configuration or explore other related aspects?

Sources:

<Slack Slack discussion> 0 button 0 button Hint: Mention @runllm in the thread for followups.

datahub_team · October 28, 2024, 12:03am

Hi ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look

Hope this helps ! ><

Topic		Replies	Views
Integrating Data Catalog from Datahub with Spark SQL for SQLMesh and Delta Tables getting-started	1	61	March 4, 2024
Integrating spark lineage listener: Choosing the correct library version ingestion	24	56	May 13, 2024
Addressing the Issue of Databricks Workspace Prefix Not Applying to Datahub Tables ingestion	3	13	January 6, 2025
Code Discrepancy in `spark.datahub.metadata.table.hive_platform_alias` Configuration in Spark Lineage Beta ingestion	6	38	June 24, 2024
Configurations for Ingesting Data and Profiling Specific Schemas in Databricks ingestion	7	12	December 16, 2024

Including the Catalog Name in Spark Emitted Table Names in Databricks

Hey there!

<@U06TM7M40RK> Can you help with the questions in this thread?

Related topics