Handling NPE Errors in DBT Ingestions and Snowflake Jobs in DataHub

Original Slack Thread

Hi team
Hello, this is the error message I am getting on dbt Ingestions (DH version 11.0).
<@U04N9PYJBEW> Thanks for helping me out on this. This is the error from the Office Hours call just now.

              {'error': 'Unable to emit metadata to DataHub GMS: java.lang.NullPointerException',
               'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                        'message': 'java.lang.NullPointerException',
                        'status': 500,
                        'id': 'urn:li:dataset:```

<@U01GZEETMEZ> have you seen a NPE from the rest sink before? Could we be producing invalid aspects as part of dbt ingestion?

Checking… What would I look for regarding an invalid aspect on a dbt ingestion?

Ah I’m mainly asking Harshal

Sure, I am also digging in some more on my side. Does this help?

I haven’t seen that before. Do you have the corresponding GMS logs handy?

Yes, I can get those.

Thanks <@U01GZEETMEZ>

This line looks suspicious

java.io.IOException: Unable to parse response body for Response{requestLine=POST /_bulk?timeout=1m HTTP/1.1, host=<https://vpc-datahub-2frsptdmsahrwqd5lhurzzx4d4.us-west-2.es.amazonaws.com:443>, response=HTTP/1.1 200 OK}```
Is it possible that ES is unhealthy?

I suspect that this is a more general problem in the backend, and not something specific to dbt because it looks like it's failing on both the subtypes aspect and the main MCEs

In any case, the NPE definitely should be handled better, although I'm not exactly sure where it's originating from, since it seems unlikely that there's a bug in `RestliUtil.toTask` cc <@UV5UEC3LN>

Hmm, we did just add a second node to our OpenSearch cluster in dev. I don’t see any alarm’s either. I will double check.

This is not only dbt, we are receiving similar error for the few of Snowflake jobs for certain schema, while all other schema runs fine.

java.net.URISyntaxException: mismatched paren nesting: urn:li:dataset:(urn:li:dataPlatform:snowflake,dev_mart_db.ea_ops_trans.sharp_ai_conversations_dr_predict```
This also seems problematic

There are tons of these ^

Yes, do you think this is an issue with DH or dbt?

Not sure, we haven’t seen this on other instances that are using DBT ingestion to my knowledge though

It’s also happening in snowflake I believe. I’ll confirm

The one above in the error message is a snowflake dataset so that checks out

The above one is snowflake dataset but through dbt source
We do have similar issue, when directly ingesting snowflake. Thats the weird part, it gives us error on two of the schemas and everything else runs smooth, thats where i thought its source side issue over DH issue. But i am not so sure.

Here are some more clues. We are seeing a NPE when ingesting DBT, however when searching DataHub for the Snowflake object I also get a 500 Error on the UI.

Getting the following error on a dbt ingestion:

And when I pull up that same table in DH, I also get an error: