Troubleshooting YAML ingestion error in Datahub 0.15.1

Original Slack Thread

Hi Team, i’m running into an issue trying to ingest glossary terms using YML file to datahub 0.15.1, the same file was working with 0.10.1 datahub
Error:

source
  value is not a valid dict (type=type_error.dict)
nodes
  extra fields not permitted (type=value_error.extra)
owners
  extra fields not permitted (type=value_error.extra)
version
  extra fields not permitted (type=value_error.extra)```

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

The error you’re encountering suggests that there might be an issue with the structure of your YAML file or the way it’s being processed by the ingestion pipeline. Here are a few steps to troubleshoot and resolve the issue:

  1. Check YAML Structure: Ensure that your YAML file follows the correct structure as per the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/business-glossary#business-glossary-file-format|Business Glossary File Format>. The top-level keys should include version, source, owners, url, and nodes.

  2. Validate YAML Syntax: Make sure your YAML file is correctly formatted. YAML is sensitive to indentation and structure, so any misalignment can cause errors.

  3. Use a Valid Recipe: Ensure that your ingestion recipe is correctly configured. According to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/business-glossary|Business Glossary Ingestion Documentation>, your recipe should specify the source type as datahub-business-glossary and include the path to your YAML file.

  4. Example Recipe:

    source:
      type: datahub-business-glossary
      config:
        file: /path/to/your/glossary.yml
        enable_auto_id: true
    
  5. Check for Extra Fields: The error message indicates that there are extra fields (nodes, owners, version) that are not permitted. Double-check your YAML file to ensure that these fields are correctly placed and not duplicated or misplaced.

  6. Version Compatibility: Ensure that the version of DataHub you are using is compatible with the YAML schema. The documentation provided is for version 0.13.1, and there might be differences in version 0.15.1.

Would you like more detailed steps or further assistance with a specific part of the process?

Sources:

Hi, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><