Injecting a Glossary from Excel into DataHub via API: Steps and Examples

Original Slack Thread

i’m new to datahub, my need is to inject a glossary i have in excel into datahub via the api. would you have any examples to meet this need? many thanks to all

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To ingest a glossary from an Excel file into DataHub, you can follow a process that involves converting the Excel data into a format that DataHub can ingest, such as YAML or CSV, and then using the DataHub API or CLI for ingestion. Here’s a general approach based on the available data sources:

  1. Convert Excel to CSV or YAML: First, you need to convert your Excel glossary into a CSV or YAML file. This can be done using Python libraries like pandas for CSV or custom scripts for YAML.

  2. Prepare the Ingestion File: Ensure that your CSV or YAML file is formatted correctly with the necessary columns such as resource, glossary_terms, description, etc. For example, a CSV file might look like this:

    resource,glossary_terms,description
    "urn:li:glossaryTerm:term1","Term 1","Description for Term 1"
    "urn:li:glossaryTerm:term2","Term 2","Description for Term 2"
    
  3. Create an Ingestion Recipe: You need to create a recipe file that specifies how to ingest the glossary into DataHub. Here’s an example of a YAML recipe file:

    source:
      type: csv-enricher
      config:
        filename: /path/to/your/glossary.csv
        write_semantics: PATCH
    sink:
      type: datahub-rest
      config:
        server: "http://localhost:8080"
    
  4. Ingest Using DataHub CLI: Use the DataHub CLI to ingest the glossary. Run the following command:

    datahub ingest -c path/to/your/recipe.yaml
    

This process will ingest your glossary terms into DataHub. If you need more detailed steps or examples, please let me know!

Sources:

  • <Slack conversation on glossary ingestion> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><