Understanding and Resolving Errors in Ingesting Data from MongoDB

Original Slack Thread

Hi Everyone, I want to ask a question about ingestion my dataset from mongodb.
but there are some error, please help me

Tools:
Using UI ingestion, set the cli version explicitly 0.12.1.5
Datahub Version: 0.12.1
Data source to integrate: mongodb

Some of the Receipe
maxSchemaSize: 300
useRandomSampling: true
enableSchemaInference: false

Some of Ingestion Report
"cli": {
"cli_version": "0.12.1.5",
"cli_entry_location": "/usr/local/lib/python3.10/site-packages/datahub/__init__.py",
"py_version": "3.10.13 (main, Jan 17 2024, 06:53:56) [GCC 12.2.0]",
...
}

Some of Error Message
UnboundLocalError: local variable 'schema_metadata' referenced before assignment

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)
  1. Are you using UI or CLI for ingestion? UI (set cli version to 0.12.1.5)
  2. Which DataHub version are you using? 0.12.1
  3. What data source(s) are you integrating with DataHub? mongodb

full log is here

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 188, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 454, in wrapper
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 403, in wrapper
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 201, in run
    ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 185, in run_ingestion_and_check_upgrade
    ret = await ingestion_future
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 139, in run_pipeline_to_completion
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 131, in run_pipeline_to_completion
    pipeline.run()
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 404, in run
    for wu in itertools.islice(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 126, in auto_stale_entity_removal
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 150, in auto_workunit_reporter
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 206, in re_emit_browse_path_v2
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 247, in auto_browse_path_v2
    for urn, batch in _batch_workunits_by_urn(stream):
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 385, in _batch_workunits_by_urn
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 163, in auto_materialize_referenced_tags
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/mongodb.py", line 474, in get_workunits_internal
    schema_metadata,
UnboundLocalError: local variable 'schema_metadata' referenced before assignment
[2024-03-26 09:29:38,785] DEBUG    {datahub.entrypoints:203} - DataHub CLI version: 0.12.1.5 at /usr/local/lib/python3.10/site-packages/datahub/__init__.py```

The PR: https://github.com/datahub-project/datahub/pull/10169 will fix it