And what’s strange, I have several Presto ingest for the same database but different schemas within it, and of of them got ingested just fine (the super small one), while the others failed with this error
Hi team, here are some more logs I am finding regarding the NPE during DBT ingestions.
<@U04N9PYJBEW> have you had a chance to check on this?
Thanks!
2023-10-16 16:30:39,932 [qtp2141817446-4675] ERROR c.l.m.filter.RestliLoggingFilter:38 - http://Rest.li|Rest.li error:
com.linkedin.restli.server.RestLiServiceException: java.lang.RuntimeException: Failed to produce MCLs
at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)
at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)
at com.linkedin.metadata.resources.entity.AspectResource.ingestProposal(AspectResource.java:206)
…
…
…
Caused by: java.lang.RuntimeException: Failed to produce MCLs
at com.linkedin.metadata.entity.EntityServiceImpl.emitMCL(EntityServiceImpl.java:684)
at com.linkedin.metadata.entity.EntityServiceImpl.ingestAspects(EntityServiceImpl.java:543)
at com.linkedin.metadata.entity.EntityServiceImpl.ingestProposalSync(EntityServiceImpl.java:855)
at com.linkedin.metadata.entity.EntityServiceImpl.ingestProposal(EntityServiceImpl.java:757)
at com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$2(AspectResource.java:225)
at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:30)
Hi Joshua, unfortunately we haven’t gotten to the root cause of this yet. Is there any way you could pinpoint the exact MCP that is causing the error? To do that, though, I think you’d have to build our metadata ingestion process locally and add some logging, as we don’t have sufficient logging right now
If you’re running this locally, can you look at the logs in the gms pod? Or look at the logs of the kubernetes gms pod, if you’re running datahub on kubernetes? That will help us debug the issue
Hi, i am with same team as Josh, i just want give you little more explanation. We received the unable to emit data to gms (caused by NPE) on Snowflake(UI based ingestion) and Dbt(UI based ingestion) as well as Tableau(CLI based ingestion) and interesting part is its not consistent, i have ingested about 95-100 tableau projects individually via CLI, i did get NPE in about 4 to 5 project on the first run, how ever on the second and third run, i do get similar amount of error but not necessarily on the same project and same datasets, its very random.
As of snoflake we have pointed down to two schemas, and
For DBT it was somewhat random like tableau.
Hey folks - we’re definitely looking to get to the bottom of this, but it’s tricky because it seems like it might be caused by some sort of race condition
One hypothesis I have is that we have issues with a particular aspect type or combination of types. To validate that, could you folks add the failure_log config and report back what gets written to that file?
...
failure_log:
enabled: true
log_config:
filename: /tmp/rest_sink_failures.log```
The other hypothesis is that multithreading in the rest sink is still causing issues - to validate that, can we check if the problem goes away if we set `max_threads: 1` in the rest sink config?
Please make sure you're on GMS 0.12.0 when testing. Thanks for bearing with us here - this is a complex problem, and we want to make sure we get to the bottom of it