Bulk Loading Tags into Datahub using a Python Script or YAML/JSON File

Original Slack Thread

Hello - Does Datahub have any way to bulk load bunch of tags into Datahub instead of creating manually ? Like via JSON or excel ?

you can use this example python script to create tags
https://github.com/datahub-project/datahub/blob/4b87156fde8e428bddd6701501351a53578df2d7/metadata-ingestion/examples/library/create_tag.py

or GQL

mutation createTag {
  createTag(input:
  {id: "dataquality:check", 
  name: "Data Quality Check", 
  description: "Data assest contains atleast one data quality check"
  
  }
  )
}```

Thanks <@U0445MUD81W> - yes we have this script and if you see it creates 1 tag. So I guess there is no other way than to repeat this script in loop for all tags I want to emit ?
Anything like a built in connector to read JSON or excel file with specific format

I thought you can convert this iterative. Anyway here is code

create create tag info with yml file or json like this
tag_file.yml

  - id: "system"
    name: System processes
    description: This is associated with system processes
  - id: "pruning"
    name: Pruning processes
    description: tag with pruning processes
  - id: "maintenance"
    name: Maintenance processes
    description: tag maintenance processes```
use python script read this file and create tags

```  from typing import List, Dict

import yaml
from datahub.emitter.mce_builder import make_tag_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.metadata.schema_classes import TagPropertiesClass
from importlib_resources import files
from importlib_resources.abc import Traversable


def get_yml_tag_entities(yml_resource: Traversable) -&gt; List[Dict]:
    yml_schema_file = yml_resource.read_text()
    cfg_data = yaml.load(yml_schema_file, Loader=yaml.FullLoader)
    entities: List[Dict] = cfg_data.get([])
    if entities:
        entities.extend(cfg_data.get("tags", []))
    else:
        entities: List[Dict] = cfg_data.get("tags", [])

    return entities


def create_tag(self, tag):
    tag_urn = make_tag_urn(tag["id"])

    tag_properties_aspect = TagPropertiesClass(
        name=tag["name"],
        description=tag["description"],
    )

    mcp: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
        entityUrn=tag_urn,
        aspect=tag_properties_aspect,
    )
    self.emitter.emit(mcp)


if __name__ == "__main__":
    print("Hello ...")
    path_tag_file = "resources"
    cfg_text: Traversable = files(path_tag_file).joinpath('tags_file.yaml')
    if cfg_text.is_file():
        tags_list_to_create: List[Dict] = get_yml_tag_entities(cfg_text)

        for tag in tags_list_to_create:
            create_tag(tag)```