Resolving Error in Creating Business Glossary in DataHub with Glossary.yaml

Original Slack Thread

Hi Team, posting my issue here
I am trying to create business glossary in datahub by using the sample glossary.yaml present in the documents(https://datahubproject.io/docs/generated/ingestion/sources/business-glossary).
while trying to ingest the yaml file, i keep getting error as :

owners
  field required (type=value_error.missing)
version
  field required (type=value_error.missing)
payload
  extra fields not permitted (type=value_error.extra)
title
  extra fields not permitted (type=value_error.extra)```
Can anyone help me fix this issue.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

Datahub version: 0.11.0
error log

Execution finished with errors.
{'exec_id': 'cfac1f6e-099f-4291-ba58-4aa8ec1cc532',
 'infos': ['2023-12-05 13:34:42.848027 INFO: Starting execution for task with name=RUN_INGEST',
           "2023-12-05 13:34:46.876204 INFO: Failed to execute 'datahub ingest'",
           '2023-12-05 13:34:46.876353 INFO: Caught exception EXECUTING task_id=cfac1f6e-099f-4291-ba58-4aa8ec1cc532, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
 'errors': []}
4 validation errors for BusinessGlossaryConfig
owners
  field required (type=value_error.missing)
version
  field required (type=value_error.missing)
payload
  extra fields not permitted (type=value_error.extra)
title
  extra fields not permitted (type=value_error.extra)```

Hi, we followed you up on the other thread! :slight_smile:

ahhh, I think I know what is the issue

the example glossary yam file needs to be loaded with an ingestion recipe

like this one -> https://datahubproject.io/docs/next/generated/ingestion/sources/business-glossary#starter-recipe

this worked for me, I just had to add manually the Marketing Domain in Datahub as it complained about not having in DataHub

So you need to create a recipe like:

  type: datahub-business-glossary
  config:
    # Coordinates
    file: /path/to/example_glossary.yaml
    enable_auto_id: true # recommended to set to true so datahub will auto-generate guids from your term names```

and this is the example_glossary.yaml file -> https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml