Troubleshooting Lineage Between Databricks Tables and Tableau Dashboards

Original Slack Thread

Hey Team, I am doing the POC on the Data hub, I am not able to find the below things.

  1. How do I get databricks tables to Tableau dashboard lineage?
  2. How do I get the databricks jobs to tableau lineage?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To better assist you, could you please provide the following information?

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery)?

This information will help tailor the guidance to your specific setup.

In the meantime, here are some general pointers based on your questions:

1. Databricks Tables to Tableau Dashboard Lineage
To achieve lineage from Databricks tables to Tableau dashboards, you need to ingest metadata from both Databricks and Tableau into DataHub.

2. Databricks Jobs to Tableau Lineage
To capture lineage from Databricks jobs to Tableau, you can use the DataHub Spark Agent to push lineage metadata from your Spark jobs in Databricks to DataHub. This will help you track the flow of data from Databricks jobs to the resulting datasets used in Tableau.

Would you like more detailed steps or examples on how to set up these ingestion pipelines?

Sources:

<@U06TM7M40RK> I have ingested both metadata now, still not able to get the lineage between table and dashboard

To troubleshoot the issue of not seeing the lineage between Databricks tables and Tableau dashboards, let’s go through a few steps to ensure everything is set up correctly:

1. Verify Metadata Ingestion
Ensure that the metadata ingestion for both Databricks and Tableau has been successful and that the metadata is correctly visible in DataHub.

2. Check Lineage Extraction
Make sure that lineage extraction is enabled and working for both Databricks and Tableau. For Databricks, verify that the Unity Catalog or Hive connector is correctly configured. For Tableau, ensure that the connector is set up to extract lineage information.

3. Validate Lineage Data
Confirm that the lineage data is being correctly emitted and ingested into DataHub. You can use the DataHub CLI to check the presence of lineage aspects.

4. Manual Lineage Addition
If automatic lineage extraction is not working as expected, you can manually add lineage relationships between entities in DataHub.

Here are some additional steps and resources to help you:

Databricks and Tableau Lineage Documentation

Manual Lineage Addition
If you need to manually add lineage, you can use the DataHub UI to add or remove upstream and downstream lineage edges. This can be done in both the Lineage Visualization screen and the Lineage tab on entity pages.

Would you like more detailed steps on any of these points or specific examples on how to set up or troubleshoot the lineage extraction?

Sources:

<@U06TM7M40RK> I am not able to ingest databricks jobs from unity catalog source

To ingest Databricks jobs from the Unity Catalog source, it’s important to ensure that your configuration is set up correctly and that you have the necessary permissions. Here are the steps and considerations to help you achieve this:

1. Ensure Correct Configuration
Make sure your ingestion recipe for the Unity Catalog source is correctly configured. Here is an example configuration:

source:
  type: unity-catalog
  config:
    workspace_url: 'https://your-databricks-workspace-url'
    token: 'your-databricks-token'
    include_hive_metastore: true
    warehouse_id: 'your-warehouse-id'
    include_table_lineage: true
    include_column_lineage: true
    profiling:
      method: "ge"
      enabled: true
      warehouse_id: 'your-warehouse-id'
    # other configurations

2. Permissions
Ensure that your Databricks service principal has the necessary permissions to access the SQL Warehouse, metastore, schemas, and tables. The permissions required are detailed in the <Databricks | DataHub Unity Catalog documentation>.

3. Debugging Ingestion Issues
If you are encountering issues, you can enable detailed logging to help diagnose the problem. Here is how you can do it:

source:
  type: unity-catalog
  config:
    workspace_url: 'https://your-databricks-workspace-url'
    token: 'your-databricks-token'
    include_hive_metastore: true
    warehouse_id: 'your-warehouse-id'
    include_table_lineage: true
    include_column_lineage: true
    profiling:
      method: "ge"
      enabled: true
      warehouse_id: 'your-warehouse-id'
    debug: true  # Enable detailed logging
    # other configurations

4. Example Recipe
Here is a more detailed example of an ingestion recipe that includes the necessary configurations:

source:
  type: unity-catalog
  config:
    workspace_url: 'https://your-databricks-workspace-url'
    token: 'your-databricks-token'
    include_hive_metastore: true
    warehouse_id: 'your-warehouse-id'
    include_table_lineage: true
    include_column_lineage: true
    profiling:
      method: "ge"
      enabled: true
      warehouse_id: 'your-warehouse-id'
    debug: true
    # other configurations

sink:
  type: datahub-rest
  config:
    server: 'http://localhost:8080'

5. Troubleshooting

  • Check Logs: Review the detailed logs to identify any specific errors or issues during the ingestion process.
  • Verify Permissions: Ensure that the service principal has the required permissions as outlined in the <Databricks | DataHub Unity Catalog prerequisites>.
  • Test Connectivity: Use the --test-source-connection flag to test the connection to the Databricks Unity Catalog.

Additional Resources

Would you like more detailed steps on any of these points or further assistance with a specific part of the process?

Sources: