Handling NPE Errors in DBT Ingestions and Snowflake Jobs in DataHub

user-5 · March 4, 2024, 4:13pm

hey y’all, I have the same error with ingests after upgrading to 10.5.
I got it with Presto and Postgres ingests. Redash ingest works fine though

               'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                        'message': 'java.lang.NullPointerException',
                        'status': 500,
                        'id': 'urn:li:container:2fcb5245bf33e1c53f52d8acd03405a2'}}],```

user-5 · March 4, 2024, 4:13pm

And what’s strange, I have several Presto ingest for the same database but different schemas within it, and of of them got ingested just fine (the super small one), while the others failed with this error

user-2 · March 4, 2024, 4:13pm

Hi team, here are some more logs I am finding regarding the NPE during DBT ingestions.

<@U04N9PYJBEW> have you had a chance to check on this?
Thanks!

2023-10-16 16:30:39,932 [qtp2141817446-4675] ERROR c.l.m.filter.RestliLoggingFilter:38 - http://Rest.li|Rest.li error:
com.linkedin.restli.server.RestLiServiceException: java.lang.RuntimeException: Failed to produce MCLs
at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)
at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)
at com.linkedin.metadata.resources.entity.AspectResource.ingestProposal(AspectResource.java:206)

…
…
…
Caused by: java.lang.RuntimeException: Failed to produce MCLs
at com.linkedin.metadata.entity.EntityServiceImpl.emitMCL(EntityServiceImpl.java:684)
at com.linkedin.metadata.entity.EntityServiceImpl.ingestAspects(EntityServiceImpl.java:543)
at com.linkedin.metadata.entity.EntityServiceImpl.ingestProposalSync(EntityServiceImpl.java:855)
at com.linkedin.metadata.entity.EntityServiceImpl.ingestProposal(EntityServiceImpl.java:757)
at com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$2(AspectResource.java:225)
at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:30)

datahub_team · March 4, 2024, 4:13pm

Hi Joshua, unfortunately we haven’t gotten to the root cause of this yet. Is there any way you could pinpoint the exact MCP that is causing the error? To do that, though, I think you’d have to build our metadata ingestion process locally and add some logging, as we don’t have sufficient logging right now

user-2 · March 4, 2024, 4:13pm

Thanks Andrew, I will take a look. I also ran a dbt ingestion in my qa environment and did NOT see the NPE. Very unnerving. Thanks for the reply!

user-3 · March 4, 2024, 4:13pm

Hi,
I am facing the same issue

user-6 · March 4, 2024, 4:13pm

Hi Ana, could you try rerunning ingestion and look for a corresponding error in gms logs?

user-3 · March 4, 2024, 4:13pm

yes! I did it and the errors are the same, the nullpointer stuff

user-6 · March 4, 2024, 4:13pm

If you’re running this locally, can you look at the logs in the gms pod? Or look at the logs of the kubernetes gms pod, if you’re running datahub on kubernetes? That will help us debug the issue

user-8 · March 4, 2024, 4:13pm

Hi, i am with same team as Josh, i just want give you little more explanation. We received the unable to emit data to gms (caused by NPE) on Snowflake(UI based ingestion) and Dbt(UI based ingestion) as well as Tableau(CLI based ingestion) and interesting part is its not consistent, i have ingested about 95-100 tableau projects individually via CLI, i did get NPE in about 4 to 5 project on the first run, how ever on the second and third run, i do get similar amount of error but not necessarily on the same project and same datasets, its very random.

As of snoflake we have pointed down to two schemas, and
For DBT it was somewhat random like tableau.

user-6 · March 4, 2024, 4:13pm

Thanks for the details, <@U0421T5KDA6>

user-4 · March 4, 2024, 4:13pm

Hey folks - we’re definitely looking to get to the bottom of this, but it’s tricky because it seems like it might be caused by some sort of race condition

One hypothesis I have is that we have issues with a particular aspect type or combination of types. To validate that, could you folks add the failure_log config and report back what gets written to that file?

  ...

failure_log:
  enabled: true
  log_config:
    filename: /tmp/rest_sink_failures.log```
The other hypothesis is that multithreading in the rest sink is still causing issues - to validate that, can we check if the problem goes away if we set `max_threads: 1` in the rest sink config?

Please make sure you're on GMS 0.12.0 when testing. Thanks for bearing with us here - this is a complex problem, and we want to make sure we get to the bottom of it

Topic		Replies	Views
Troubleshooting errors when emitting metadata to DataHub GMS ingestion	1	93	May 13, 2024
Troubleshooting intermittent ingest failures with NullPointerException error in Datahub 10.5 troubleshoot	13	61	March 4, 2024
Debugging 'Unable to emit metadata to DataHub GMS: java.lang.RuntimeException: Failed to produce MCLs' error after upgrading to 0.11.0 ingestion	16	114	March 4, 2024
Troubleshooting NullPointerException Error in Datahub Version 0.14.0 ingestion	3	8	February 3, 2025
Troubleshooting Errors in Data Ingestion with DataHub, DBT, and BigQuery ingestion	0	35	June 24, 2024

Handling NPE Errors in DBT Ingestions and Snowflake Jobs in DataHub

Related topics