Hello - Does Datahub have any way to bulk load bunch of tags into Datahub instead of creating manually ? Like via JSON or excel ?
you can use this example python script to create tags
https://github.com/datahub-project/datahub/blob/4b87156fde8e428bddd6701501351a53578df2d7/metadata-ingestion/examples/library/create_tag.py
or GQL
mutation createTag {
createTag(input:
{id: "dataquality:check",
name: "Data Quality Check",
description: "Data assest contains atleast one data quality check"
}
)
}```
Thanks <@U0445MUD81W> - yes we have this script and if you see it creates 1 tag. So I guess there is no other way than to repeat this script in loop for all tags I want to emit ?
Anything like a built in connector to read JSON or excel file with specific format
I thought you can convert this iterative. Anyway here is code
create create tag info with yml file or json like this
tag_file.yml
- id: "system"
name: System processes
description: This is associated with system processes
- id: "pruning"
name: Pruning processes
description: tag with pruning processes
- id: "maintenance"
name: Maintenance processes
description: tag maintenance processes```
use python script read this file and create tags
``` from typing import List, Dict
import yaml
from datahub.emitter.mce_builder import make_tag_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.metadata.schema_classes import TagPropertiesClass
from importlib_resources import files
from importlib_resources.abc import Traversable
def get_yml_tag_entities(yml_resource: Traversable) -> List[Dict]:
yml_schema_file = yml_resource.read_text()
cfg_data = yaml.load(yml_schema_file, Loader=yaml.FullLoader)
entities: List[Dict] = cfg_data.get([])
if entities:
entities.extend(cfg_data.get("tags", []))
else:
entities: List[Dict] = cfg_data.get("tags", [])
return entities
def create_tag(self, tag):
tag_urn = make_tag_urn(tag["id"])
tag_properties_aspect = TagPropertiesClass(
name=tag["name"],
description=tag["description"],
)
mcp: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityUrn=tag_urn,
aspect=tag_properties_aspect,
)
self.emitter.emit(mcp)
if __name__ == "__main__":
print("Hello ...")
path_tag_file = "resources"
cfg_text: Traversable = files(path_tag_file).joinpath('tags_file.yaml')
if cfg_text.is_file():
tags_list_to_create: List[Dict] = get_yml_tag_entities(cfg_text)
for tag in tags_list_to_create:
create_tag(tag)```