Activating Data Lineage for Oracle Database in DataHub

Original Slack Thread

Hi, I’m connecting to my Oracle database and Datahub, but even though I’ve added the data lineage, it’s not bringing it up.
How do I activate this procedure?

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To address the issue of not seeing data lineage in DataHub for your Oracle database, there are several steps and configurations you should verify:

  1. Ensure Lineage Extraction is Enabled: According to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/oracle#config-details|DataHub documentation>, you need to ensure that your ingestion configuration for Oracle includes settings that enable lineage extraction. This typically involves setting parameters like include_view_lineage and include_column_lineage to true.

  2. Verify Configuration: Double-check your ingestion configuration file to ensure all necessary paths and settings are correctly specified. The configuration should include the correct database connection details and enable lineage extraction.

  3. Check for Known Issues: There might be specific bugs or limitations affecting your setup. For example, column-level lineage for Oracle is not an out-of-the-box feature, as mentioned in a <Slack discussion>. You may need to use the DataHub SDK to manually emit lineage.

  4. Manual Lineage Addition: If automatic lineage extraction is not working, you can manually add lineage using the DataHub UI or APIs. This is supported as of version 0.9.5, allowing you to add or remove upstream and downstream lineage edges manually (<https://datahubproject.io/docs/0.14.0/generated/lineage/lineage-feature-guide|Lineage Feature Guide>).

  5. Use DataHub SDK: If you need to manually emit lineage, you can use the DataHub SDK to define lineage relationships between datasets programmatically. Here’s an example of how you can use the DataHub Python SDK to emit lineage:

    from datahub.emitter.mce_builder import make_dataset_urn
    from datahub.emitter.rest_emitter import DatahubRestEmitter
    from datahub.metadata.schema_classes import (
        DatasetLineageTypeClass,
        UpstreamClass,
        UpstreamLineageClass,
    )
    
    # Initialize the DataHub emitter
    emitter = DatahubRestEmitter("http://localhost:8080")
    
    # Define the source and target datasets
    source_dataset_urn = make_dataset_urn(platform="oracle", name="source_table", env="PROD")
    target_dataset_urn = make_dataset_urn(platform="oracle", name="target_table", env="PROD")
    
    # Create the lineage relationship
    upstream = UpstreamClass(
        dataset=source_dataset_urn,
        type=DatasetLineageTypeClass.TRANSFORMED,
    )
    
    lineage = UpstreamLineageClass(
        upstreams=[upstream],
    )
    
    # Emit the lineage to DataHub
    emitter.emit_mce(
        {
            "entityType": "dataset",
            "entityUrn": target_dataset_urn,
            "aspectName": "upstreamLineage",
            "aspect": lineage,
        }
    )
    

If you have verified these settings and still do not see lineage, it might be a specific issue with your setup or a bug in the version you are using. Would you like more detailed steps or troubleshooting tips?

Sources:

Hi, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><