Hi, I’m connecting to my Oracle database and Datahub, but even though I’ve added the data lineage, it’s not bringing it up.
How do I activate this procedure?
Hey there! 
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To address the issue of not seeing data lineage in DataHub for your Oracle database, there are several steps and configurations you should verify:
-
Ensure Lineage Extraction is Enabled: According to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/oracle#config-details|DataHub documentation>, you need to ensure that your ingestion configuration for Oracle includes settings that enable lineage extraction. This typically involves setting parameters like
include_view_lineage
andinclude_column_lineage
totrue
. -
Verify Configuration: Double-check your ingestion configuration file to ensure all necessary paths and settings are correctly specified. The configuration should include the correct database connection details and enable lineage extraction.
-
Check for Known Issues: There might be specific bugs or limitations affecting your setup. For example, column-level lineage for Oracle is not an out-of-the-box feature, as mentioned in a <Slack discussion>. You may need to use the DataHub SDK to manually emit lineage.
-
Manual Lineage Addition: If automatic lineage extraction is not working, you can manually add lineage using the DataHub UI or APIs. This is supported as of version 0.9.5, allowing you to add or remove upstream and downstream lineage edges manually (<https://datahubproject.io/docs/0.14.0/generated/lineage/lineage-feature-guide|Lineage Feature Guide>).
-
Use DataHub SDK: If you need to manually emit lineage, you can use the DataHub SDK to define lineage relationships between datasets programmatically. Here’s an example of how you can use the DataHub Python SDK to emit lineage:
from datahub.emitter.mce_builder import make_dataset_urn from datahub.emitter.rest_emitter import DatahubRestEmitter from datahub.metadata.schema_classes import ( DatasetLineageTypeClass, UpstreamClass, UpstreamLineageClass, ) # Initialize the DataHub emitter emitter = DatahubRestEmitter("http://localhost:8080") # Define the source and target datasets source_dataset_urn = make_dataset_urn(platform="oracle", name="source_table", env="PROD") target_dataset_urn = make_dataset_urn(platform="oracle", name="target_table", env="PROD") # Create the lineage relationship upstream = UpstreamClass( dataset=source_dataset_urn, type=DatasetLineageTypeClass.TRANSFORMED, ) lineage = UpstreamLineageClass( upstreams=[upstream], ) # Emit the lineage to DataHub emitter.emit_mce( { "entityType": "dataset", "entityUrn": target_dataset_urn, "aspectName": "upstreamLineage", "aspect": lineage, } )
If you have verified these settings and still do not see lineage, it might be a specific issue with your setup or a bug in the version you are using. Would you like more detailed steps or troubleshooting tips?
Sources:
- <Slack discussion>
- <https://datahubproject.io/docs/0.14.0/generated/lineage/lineage-feature-guide|Lineage Feature Guide>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><