Error Troubleshooting for Databricks Ingestion in DataHub 0.13.1

Original Slack Thread

Hi, I got this databricks error for 0.13.1 [2024-08-08 09:36:14,575] ERROR {datahub.entrypoints:201} - Command failed: '2162834762620156'
Traceback (most recent call last):
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/entrypoints.py”, line 188, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/click/core.py”, line 1157, in call
return self.main(*args, **kwargs)
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/click/core.py”, line 1078, in main
rv = self.invoke(ctx)
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/click/core.py”, line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/click/core.py”, line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/click/core.py”, line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/click/core.py”, line 783, in invoke
return __callback(*args, **kwargs)
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/telemetry/telemetry.py”, line 454, in wrapper
raise e
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/telemetry/telemetry.py”, line 403, in wrapper
res = func(*args, **kwargs)
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/cli/ingest_cli.py”, line 201, in run
ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
File “/usr/local/lib/python3.10/asyncio/base_events.py”, line 649, in run_until_complete
return future.result()
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/cli/ingest_cli.py”, line 185, in run_ingestion_and_check_upgrade
ret = await ingestion_future
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/cli/ingest_cli.py”, line 139, in run_pipeline_to_completion
raise e
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/cli/ingest_cli.py”, line 131, in run_pipeline_to_completion
pipeline.run()
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py”, line 405, in run
for wu in itertools.islice(
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py”, line 147, in auto_stale_entity_removal
for wu in stream:
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py”, line 171, in auto_workunit_reporter
for wu in stream:
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py”, line 253, in auto_browse_path_v2
for urn, batch in _batch_workunits_by_urn(stream):
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py”, line 391, in _batch_workunits_by_urn
for wu in stream:
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py”, line 184, in auto_materialize_referenced_tags
for wu in stream:
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py”, line 91, in auto_status_aspect
for wu in stream:
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/source/unity/source.py”, line 266, in get_workunits_internal
yield from self.process_metastores()
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/source/unity/source.py”, line 396, in process_metastores
yield from self.process_catalogs(metastore)
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/source/unity/source.py”, line 409, in process_catalogs
yield from self.process_schemas(catalog)
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/source/unity/source.py”, line 432, in process_schemas
yield from self.process_tables(schema)
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/source/unity/source.py”, line 456, in process_tables
yield from self.process_table(table, schema)
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/ingestion/source/unity/source.py”, line 480, in process_table
table.ref, self.notebooks[str(notebook_id)]
File “/tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/utilities/file_backed_collections.py”, line 301, in getitem
raise KeyError(key)
KeyError: ‘2162834762620156’
[2024-08-08 09:36:14,612] DEBUG {datahub.entrypoints:203} - DataHub CLI version: 0.13.1 at /tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/lib/python3.10/site-packages/datahub/init.py
[2024-08-08 09:36:14,613] DEBUG {datahub.entrypoints:206} - Python version: 3.10.13 (main, Jan 17 2024, 06:53:56) [GCC 12.2.0] at /tmp/datahub/ingest/venv-unity-catalog-b3b89d5496a000a4/bin/python3 on Linux-5.10.216-204.855.amzn2.x86_64-x86_64-with-glibc2.36
[2024-08-08 09:36:14,613] DEBUG {datahub.entrypoints:211} - GMS config {‘models’: {}, ‘patchCapable’: True, ‘versions’: {‘acryldata/datahub’: {‘version’: ‘v0.13.1’, ‘commit’: ‘2873736eace1dbc6bd8d19206b0d4c4e2f02a535’}}, ‘managedIngestion’: {‘defaultCliVersion’: ‘0.13.1.2’, ‘enabled’: True}, ‘statefulIngestionCapable’: True, ‘supportsImpactAnalysis’: True, ‘timeZone’: ‘GMT’, ‘telemetry’: {‘enabledCli’: True, ‘enabledIngestion’: False}, ‘datasetUrnNameCasing’: False, ‘retention’: ‘true’, ‘datahub’: {‘serverType’: ‘prod’}, ‘noCode’: ‘true’

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses! We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66|> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :+1: or :-1: so other folks in the Community know if you’re still looking for help on the topic and to help us improve the assistant!

To help you resolve the issue with Databricks ingestion, I need a bit more information:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery, Databricks, etc.)?

Based on the error message you provided, it seems that the issue is related to a missing key in the notebooks dictionary. Specifically, the error indicates a KeyError for the key '2162834762620156', which suggests that the notebook ID is not found in the notebooks dictionary.

According to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/README.md|DataHub Metadata Ingestion documentation>, you can configure and execute a metadata ingestion pipeline through the UI, CLI, or SDK-based ingestion. If you are using the CLI, you might want to check your ingestion recipe to ensure that all necessary configurations are correctly set.

Would you like more detailed steps or further assistance on this?

Sources: