Hi <@U01GZEETMEZ>, can you review this and let me know what is the issue with this script? Thanks !
Scripts used:
- Business glossary.yaml
source: DataHub
owners:
users:
- fency
url: "<https://github.pie.apple.com/storeAnalytics/datahub>"
nodes:
- name: Classification
description: A set of terms related to Data Classification
knowledge_links:
- label: Wiki link for classification
url: "<https://en.wikipedia.org/wiki/Classification>"
terms:
- name: Sensitive
description: Sensitive Data
custom_properties:
is_confidential: false
- name: Confidential
description: Confidential Data
custom_properties:
is_confidential: true
- name: HighlyConfidential
description: Highly Confidential Data
custom_properties:
is_confidential: true
domain: "urn:li:domain:rsa"
- name: PII
description: All terms related to personal information
terms:
- name: email
description: An individual's email address
inherits:
- Classification.Confidential
- name: address
description: A physical address
- name: gender
description: The gender identity of the individual
inherits:
- Classification.Sensitive
- name: ssn
description: social security number
inherits:
- Classification.Sensitive
domain: "urn:li:domain:rsa"```
2. Script used:
``` def snowflake_business_glossary():
"""business glossary ingestion for snowflake entities """
from datahub.configuration.config_loader import load_config_file
from datahub.ingestion.run.pipeline import Pipeline
<http://logger.info|logger.info>("Creating business glossary")
pipeline = Pipeline.create(
# This configuration is analogous to a recipe configuration.
{
"source": {
"type": "datahub-business-glossary",
"config": {
"file": "/business_glossary_recipe.yml",
"enable_auto_id" : False
},
},
"sink": {
"type": "datahub-rest",
"config": {
"server": f"{cf.server}",
"token": f"{cf.token}"
},
},
"transformers": [
{
"type": "simple_add_dataset_terms",
"config": {
"semantics": "PATCH",
"term_urns": ["urn:li:glossaryTerm:PII.email"],
"term_urns": ["urn:li:glossaryTerm:PII.ssn"],
},
}
],
}
)
pipeline.run()
pipeline.pretty_print_summary()
pipeline.raise_from_status()```
this module ran without any error, but datasets did not get any terms added. Airflow log
``` [2024-04-12, 10:29:51 PDT] {logging_mixin.py:154} INFO - {'cli_version': '0.12.0.2',
'cli_entry_location': '/usr/local/airflow/.local/lib/python3.8/site-packages/datahub/__init__.py',
'py_version': '3.8.17 (default, Aug 10 2023, 12:50:17) \n[GCC 8.5.0 20210514 (Red Hat 8.5.0-20)]',
'py_exec_path': '/usr/bin/python3.8',
'os_details': 'Linux-5.10.210-201.852.amzn2.x86_64-x86_64-with-glibc2.2.5',
'peak_memory_usage': '252.67 MB',
'mem_info': '252.67 MB',
'peak_disk_usage': '30.66 GB',
'disk_info': {'total': '53.67 GB', 'used': '30.66 GB', 'free': '23.01 GB'}}
[2024-04-12, 10:29:51 PDT] {logging_mixin.py:154} INFO - Source (datahub-business-glossary) report:
[2024-04-12, 10:29:51 PDT] {logging_mixin.py:154} INFO - {'events_produced': 21,
'events_produced_per_sec': 46,
'entities': {'glossaryNode': ['urn:li:glossaryNode:Classification', 'urn:li:glossaryNode:PII'],
'glossaryTerm': ['urn:li:glossaryTerm:Classification.Sensitive',
'urn:li:glossaryTerm:Classification.Confidential',
'urn:li:glossaryTerm:Classification.HighlyConfidential',
'urn:li:glossaryTerm:PII.email',
'urn:li:glossaryTerm:PII.address',
'urn:li:glossaryTerm:PII.gender',
'urn:li:glossaryTerm:PII.ssn']},
'aspects': {'glossaryNode': {'glossaryNodeInfo': 2, 'ownership': 2, 'institutionalMemory': 1, 'status': 2},
'glossaryTerm': {'glossaryTermInfo': 7, 'ownership': 7, 'domains': 2, 'glossaryRelatedTerms': 3, 'status': 7}},
'warnings': {},
'failures': {},
'start_time': '2024-04-12 17:29:50.610827 (now)',
'running_time': '0.45 seconds'}
[2024-04-12, 10:29:51 PDT] {logging_mixin.py:154} INFO - Sink (datahub-rest) report:
[2024-04-12, 10:29:51 PDT] {logging_mixin.py:154} INFO - {'total_records_written': 21,
'records_written_per_second': 33,
'warnings': [],
'failures': [],
'start_time': '2024-04-12 17:29:50.442028 (now)',
'current_time': '2024-04-12 17:29:51.064747 (now)',
'total_duration_in_seconds': 0.62,
'gms_version': 'null',
'pending_requests': 0}```