Troubleshooting missing lineage info in DataHub UI after successful Spark pipeline execution

Original Slack Thread

Hi Team
ingestion : spark jar CLI
DataHub version: 0.12.1
source: spark
I run datahub-spark-lineage:0.12.1-1 with spark-submit.
It is successful to see spark pipeline and spark task on datahub ui.
But there is not lineage info on ui , although the lineage info already in mysql and kafka MetadataChangeLog_Versioned_v1 topic.
lineage info in topic valuse such as :
dataJoburn:li:dataJob:(urn:li:dataFlow:(spark,enterprise_green_certification_dws_test_spark_lineage,yarn),QueryExecId_2)$dataJobInputOutput{"inputDatasets":["urn:li:dataset:(urn:li:dataPlatform:hdfs,<hdfs://intsig-bigdata-nameservice/user/hive/warehouse/staging_edw_company.db/s_db_qualification_certificate_t_certificate_main/dt_batch=202401230000,PROD>)","urn:li:dataset:(urn:li:dataPlatform:hdfs,<hdfs://intsig-bigdata-nameservice/user/hive/warehouse/staging_edw_company.db/s_db_qualification_certificate_t_management_system/dt_batch=202401230000,PROD>)"],"outputDatasets":["urn:li:dataset:(urn:li:dataPlatform:hive,test.edw_company_dws_dg_hq_enterprise_green_certification_1d_df,PROD)"]} application/jsonc$no-run-id-provided$no-run-id-providedc@urn:li:corpuser:__datahub_system
What should i check about. So appreciate for you respone!

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

The issue is most probably is the upstream/downstream datasets don’t exist in DataHub and we don’t show lineage to non existing datasets.

We are going to release a new Spark plugin soon where there is way to materialise dataset in the DataHub.