Debugging 'Unable to emit metadata to DataHub GMS: java.lang.RuntimeException: Failed to produce MCLs' error after upgrading to 0.11.0

user-5 · March 4, 2024, 3:40pm

Getting a new error after upgrading to 11, it is affecting a Snowflake ingestion, multiple separate snowflake ingestions are running completely fine. Curious if anyone has debugged this before.

             'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                        'message': 'java.lang.RuntimeException: Failed to produce MCLs',
                        'status': 500,```

user-4 · March 4, 2024, 3:40pm

<@U058HUSFHLL> Could you please share the gms logs ? steps are here https://datahubproject.io/docs/how/extract-container-logs/

user-4 · March 4, 2024, 3:40pm

<@U04N9PYJBEW> might help you

user-5 · March 4, 2024, 3:40pm

Anything I’m looking for specifically <@U0348BYAS56>? I found the INGEST PROPOSAL for the dataset aspects that seems to be failing in each nightly ingestion, but nothing insightful in the logs besides a bunch of ingest proposals for the dataset.
Each nightly ingestion seems to be failing for the same table which is interesting. And reverting back to 0.10.5 fixed it, but I would like to stay on 0.11.0

user-5 · March 4, 2024, 3:40pm

It happens to be a very large table, 500M rows, but I dont see why that would make a difference, also… when I check that table in DataHub it looks like it was ingested correctly, we got the most recent row counts and operation aspect for it. Even though the ingestion finishes with a failure.

user-3 · March 4, 2024, 3:40pm

Can you post the full error? Is there any extra info after the “500”? It would be great to see every example of the 500 error as well. <@U04UKA5L5LK> any ideas on why we might be getting this error from gms?

user-5 · March 4, 2024, 3:40pm

I can only really grab the error from the Sink (datahub-rest) report the logs themselves from the GMS have no clear error, this is the whole error …

[2023-10-10, 04:06:15 PDT] {{pod_manager.py:381}} INFO -                'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
[2023-10-10, 04:06:15 PDT] {{pod_manager.py:381}} INFO -                         'message': 'java.lang.RuntimeException: Failed to produce MCLs',
[2023-10-10, 04:06:15 PDT] {{pod_manager.py:381}} INFO -                         'status': 500,
[2023-10-10, 04:06:15 PDT] {{pod_manager.py:381}} INFO -                         'id': 'urn:li:dataset:(urn:li:dataPlatform:snowflake,db1.schema1.table_name,PROD)'}}],```

user-5 · March 4, 2024, 3:40pm

Im realizing that all the tables do in fact seem to be successfully ingested, but since its registering this error its marking the ingestion failed.

user-3 · March 4, 2024, 3:40pm

Yeah, currently if there are any sink errors, we’ll report the ingestion as failed. We’re in the process of changing that, but that’ll take some time. We’re likely missing one aspect for one table at the moment

user-5 · March 4, 2024, 3:40pm

Hmm, missing from the source side or the sink? Like specifically the upsert is failing? Is there any checks I can do?

datahub_team · March 4, 2024, 3:40pm

I just means that we failed to send to datahub one aspect. Unfortunately, we don’t have great logs on what that aspect was, but you can see which urn it’s for

user-5 · March 4, 2024, 3:40pm

I see… in that case im not sure what the best way to get around this is. It is marking our daily airflow task as a failure and causing noise, Anything you think I can try to get it working?

datahub_team · March 4, 2024, 3:40pm

Ah I see. If it’s okay for now, you can skip ingesting the table via the table_pattern parameter (I assume this is snowflake ingestion), which should avoid the error. And we’ll look into what’s causing this error.

user-5 · March 4, 2024, 3:40pm

<@U04N9PYJBEW> If there is a bug opened for this or a way I can track it please let me know, thanks!

user-5 · March 4, 2024, 3:40pm

<@U04N9PYJBEW> Hey, checking in a few months later, this ingestion still fails each night because of one tables aspect that is not correctly being sent to Datahub. Has there been any development on this? I would rather not exclude the table cuz its other aspects are getting emmitted properly. Would prefer just a WARNING and the ingestion completing successfully, or ideally for the aspect to be emmitted with proper logs if it fails. We are running 0.12.1.

user-3 · March 4, 2024, 3:40pm

Unfortunately we haven’t finished our ingestion status refactor yet, so this will still cause an ingestion failure. We don’t want to make all “failed to ingest” sink errors warnings, as if all MCPs are not emitting properly, that should certainly be considered an ingestion failure. Our desired end state is that we allow a certain percentage of sink failures, e.g. 1%, where ingestion status is warning, but if that threshold is passed we consider the ingestion failed. But building that requires counting how many MCPs are emitted and how many fail to get emitted, which we haven’t gotten around to yet. I’d like to get this in by the end of Q1 next year, but we don’t have any firm commitments there

user-2 · March 4, 2024, 3:40pm

Hi, I seemed to get this error too for snowflake but when I upgraded to 0.12.1 it worked. I keep getting this error though for Vertica and PowerBI resources.

Topic		Replies	Views
Troubleshooting BigQuery Ingestion Failure with 'Failed to produce MCLs' Error troubleshoot	21	210	March 4, 2024
Troubleshooting Metadata Ingestion Error 'Failed to produce MCLs' ingestion	8	95	June 3, 2024
Troubleshooting errors when emitting metadata to DataHub GMS ingestion	1	92	May 13, 2024
Handling NPE Errors in DBT Ingestions and Snowflake Jobs in DataHub ingestion	31	111	March 4, 2024
Troubleshooting intermittent ingest failures with NullPointerException error in Datahub 10.5 troubleshoot	13	61	March 4, 2024

Debugging 'Unable to emit metadata to DataHub GMS: java.lang.RuntimeException: Failed to produce MCLs' error after upgrading to 0.11.0

Related topics