Troubleshooting dataPlatform load failure on Datahub GMS

user-1 · March 4, 2024, 4:47pm

Hello, I am running Datahub v0.11 on Mysql.
I accidentally sent lineage data with the ‘wrong urn’ through Airflow today.
The urn was supposed to be of the form ‘urn:li:dataset:(urn:li:dataPlatform:hive,table_name,PROD)’, but I ended up capitalizing ‘hive’, meaning it was of the form ‘urn:li:dataset:(urn:li:dataPlatform:HIVE,table_name,PROD)’.
After this, the datahub main page is not loading the dataPlatform properly.

This is datagub GMS error log when I go to main page.

	at com.linkedin.datahub.graphql.types.dataplatform.DataPlatformType.batchLoad(DataPlatformType.java:59)
	at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$createDataLoader$213(GmsGraphQLEngine.java:1813)
	... 17 common frames omitted
Caused by: java.lang.IllegalStateException: Duplicate key EntityAspectIdentifier(urn=urn:li:dataPlatform:hive, aspect=dataPlatformInfo, version=0) (attempted merging values com.linkedin.metadata.entity.EntityAspect@b79d8e5d and com.linkedin.metadata.entity.EntityAspect@b79d8e5d)
	at java.base/java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
	at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
	at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
	at java.base/java.util.ArrayList$Itr.forEachRemaining(ArrayList.java:1033)
	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at com.linkedin.metadata.entity.ebean.EbeanAspectDao.batchGet(EbeanAspectDao.java:264)
	at com.linkedin.metadata.entity.EntityServiceImpl.getEnvelopedAspects(EntityServiceImpl.java:1748)
	at com.linkedin.metadata.entity.EntityServiceImpl.getCorrespondingAspects(EntityServiceImpl.java:391)
	at com.linkedin.metadata.entity.EntityServiceImpl.getLatestEnvelopedAspects(EntityServiceImpl.java:345)
	at com.linkedin.metadata.entity.EntityServiceImpl.getEntitiesV2(EntityServiceImpl.java:299)
	at com.linkedin.metadata.client.JavaEntityClient.batchGetV2(JavaEntityClient.java:124)
	at com.linkedin.datahub.graphql.types.dataplatform.DataPlatformType.batchLoad(DataPlatformType.java:44)
	... 18 common frames omitted```
After seeing that error message, I suspected that a dataPlatform named 'HIVE' had been created and that this was causing a conflict.. I checked the mysql table and dataplatformindex_v2 in elasticsearch, but 'urn:li:dataPlatform:HIVE' did not exist.
Is there anything else I can do here? Thank you.

datahub_team · March 4, 2024, 4:47pm

Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!

Which DataHub version are you using? (e.g. 0.12.0)
Please post any relevant error logs on the thread!

user-2 · March 4, 2024, 4:47pm

Please follow this thread to resolve the issue
https://datahubspace.slack.com/archives/C029A3M079U/p1676616769274879

user-1 · March 4, 2024, 4:47pm

• <@U0445MUD81W> Problem solved. Thank you!!

Topic		Replies	Views
Troubleshooting errors when emitting metadata to DataHub GMS ingestion	1	100	May 13, 2024
Troubleshooting DataHub Deployment with Multiple Data Entity Copies and Lineage Issues troubleshoot	3	98	March 4, 2024
Troubleshooting Errors in Custom Datahub Installation all-things-deployment	18	83	March 4, 2024
Handling Duplicate Datasets and Snowflake Job Hangs in DataHub Ingestion ingestion	7	5	November 4, 2024
Investigating Pipeline Failures and Invalid URN Format ingestion	2	55	March 4, 2024

Troubleshooting dataPlatform load failure on Datahub GMS

Related topics