Managing Glossary Terms in DataHub

user-1 · April 7, 2025, 12:05am

<@U06TM7M40RK> why is my id displaying on the UI rather than the name of my glossary term

name: Fund Legal Name
id: Fund-Legal-Name

user-2 · April 7, 2025, 12:05am

The issue you’re experiencing, where the ID of your glossary term is displayed instead of the name, is likely due to how the DataHub UI handles the rendering of glossary terms. According to the <https://github.com/datahub-project/datahub/blob/master/datahub-graphql-core/src/main/resources/entity.graphql|DataHub GraphQL schema>, the name field for glossary terms is deprecated, and the properties.name field should be used instead. This means that if the UI is not updated to use properties.name, it might default to displaying the ID.

Additionally, there have been known issues where tags and glossary terms created via APIs do not render correctly in the DataHub frontend, as mentioned in <https://github.com/datahub-project/datahub/issues/6831|GitHub Issue #6831>.

To resolve this, ensure that your glossary term is correctly defined with the properties field, and check if there are any updates or patches available for the DataHub version you are using that address this rendering issue.

Would you like more detailed steps or additional examples on how to address this issue?

Sources:

<https://github.com/datahub-project/datahub/issues/6831|GitHub Issue #6831>
<https://github.com/datahub-project/datahub/blob/master/datahub-graphql-core/src/main/resources/entity.graphql|DataHub GraphQL schema> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · April 7, 2025, 12:05am

<@U06TM7M40RK> How do you change the id of an already existing business glossary term

user-2 · April 7, 2025, 12:05am

It seems likely that changing the ID of an already existing business glossary term in DataHub is not straightforward. According to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/business-glossary#generating-custom-ids-for-your-terms|DataHub documentation>, once a custom ID is selected for a glossary term, it cannot be easily changed. This is because IDs are meant to be stable and unique across the entire glossary.

If you need to change the ID, a possible workaround would be to create a new glossary term with the desired ID and then update any references to the old term to point to the new one. After ensuring all references are updated, you can then delete the old term.

Would you like more detailed steps or additional examples on how to perform this workaround?

Sources:

<https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/business-glossary#generating-custom-ids-for-your-terms|DataHub documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · April 7, 2025, 12:05am

<@U06TM7M40RK> how do you go about deleting old terms in a proper way. I am ingesting the glossary terms via file and there were already removed from there

user-2 · April 7, 2025, 12:05am

To properly delete old glossary terms in DataHub, especially when they have been removed from your ingestion file, you can follow these steps:

Use the DataHub CLI: You can use the DataHub CLI to delete glossary terms. This can be done by specifying the entity type and using the appropriate flags for deletion.
Perform a Hard Delete: If you want to completely remove the glossary terms from DataHub, you should perform a hard delete. This will remove the terms and their references across the metadata graph.

Here’s a command example for hard deleting glossary terms:
```
datahub delete --entity-type glossaryTerm --hard
```
Use Filters if Necessary: If you want to delete specific terms rather than all, you can use filters to narrow down the entities you want to delete. For example, you can specify a query to match certain terms.
Check for References: Before deleting, ensure that the terms are not referenced elsewhere in your metadata. This can prevent issues with dangling references.
Test with Dry Run: Always use the --dry-run option first to see what entities will be affected by your delete command without actually performing the deletion.
Update Ingestion Files: Ensure that your ingestion files are updated to reflect the current state of your glossary. This will prevent re-ingestion of deleted terms.
Rollback if Necessary: If you accidentally delete terms, you can use the rollback feature to revert changes made by a specific ingestion run.

For more detailed guidance, you can refer to the <https://datahubproject.io/docs/0.14.0/how/delete-metadata|DataHub documentation on deleting metadata>.

Would you like more detailed steps or additSorry, there was a temporary error when processing your request. Please try again. If this problem persists, please contact support. 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · April 7, 2025, 12:05am

<@U06TM7M40RK> how can you link dbt columns to glossary terms

user-2 · April 7, 2025, 12:05am

To link dbt columns to glossary terms in DataHub, you can use the meta_mapping and column_meta_mapping configurations in your dbt project. These configurations allow you to define actions such as adding tags, terms, or owners based on the metadata properties of your dbt models and columns.

Here’s a general approach based on the available documentation:

Define Meta Properties in dbt: In your dbt model files, you can define meta properties that will be used to map to glossary terms. For example, you might have a meta property like glossary_term: "Customer".
Configure Meta Mappings: In your DataHub ingestion recipe, you can define meta_mapping and column_meta_mapping to specify how these meta properties should be translated into actions like adding glossary terms. For example:
```
column_meta_mapping:
  glossary_term:
    match: ".*"
    operation: "add_term"
    config:
      term: "{{ $match }}"
```
This configuration will match any glossary_term meta property and add a glossary term with the same name to the column.
Ingest Metadata: Run the DataHub ingestion process with your configured recipe. This will apply the mappings and link the dbt columns to the specified glossary terms.

For more detailed information, you can refer to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt#dbt-meta-automated-mappings|dbt meta automated mappings documentation>.

Would you like more detailed steps or additional examples on how to set this up?

Sources:

<https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/dbt#dbt-meta-automated-mappings|dbt meta automated mappings documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · April 7, 2025, 12:05am

<@U06TM7M40RK> that link is broken that you sent

user-2 · April 7, 2025, 12:05am

I apologize for the inconvenience. Let’s try a different approach to link dbt columns to glossary terms in DataHub.

To link dbt columns to glossary terms, you can follow these steps:

Define Meta Properties in dbt: In your dbt model files, you can define meta properties that will be used to map to glossary terms. For example, you might have a meta property like glossary_term: "Customer".
Configure Meta Mappings: In your DataHub ingestion recipe, you can define meta_mapping and column_meta_mapping to specify how these meta properties should be translated into actions like adding glossary terms. For example:
```
column_meta_mapping:
  glossary_term:
    match: ".*"
    operation: "add_term"
    config:
      term: "{{ $match }}"
```
This configuration will match any glossary_term meta property and add a glossary term with the same name to the column.
Ingest Metadata: Run the DataHub ingestion process with your configured recipe. This will apply the mappings and link the dbt columns to the specified glossary terms.

For more detailed information, you can refer to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/dbt/dbt.md|DataHub documentation on dbt meta automated mappings>.

Would you like more detailed steps or additional examples on how to set this up?

Sources:

<https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/dbt/dbt.md|DataHub documentation on dbt meta automated mappings> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · April 7, 2025, 12:05am

<@U06TM7M40RK> when using “add_terms” the ingestion is creating new glossary terms rather than matching to the existing terms. How are you supposed to reference terms in the configs

column_meta_mapping:
terms_list:
match: “.*”
operation: “add_terms”
config:
separator: “;”

example of how we have it

name: fund_legal_name
description: example descript
meta:
terms_list: Business-Term.Fund-Legal-Name; Classification-Term.High

user-2 · April 7, 2025, 12:05am

When using the add_terms operation in DataHub’s ingestion framework, it’s important to ensure that the terms you are referencing in your configuration match the existing glossary terms in DataHub. If new terms are being created instead of matching existing ones, it could be due to a mismatch in the term identifiers or URNs.

Here are some steps to ensure that the terms are correctly referenced:

Use Correct URNs: Ensure that the terms in your terms_list are specified using their correct URNs. The URN format for glossary terms is typically urn:li:glossaryTerm:<term-id>. If you are using names like Business-Term.Fund-Legal-Name, make sure these are the correct identifiers used in your DataHub instance.
Check Term Existence: Verify that the terms you are referencing already exist in your DataHub glossary. You can do this by searching for the terms in the DataHub UI or using the DataHub CLI to list existing terms.
Consistent Naming: Ensure that the names or IDs used in your terms_list match exactly with those in the DataHub glossary. Any discrepancy in naming or casing can lead to the creation of new terms.
Configuration Example: Here’s how you might configure the column_meta_mapping to add existing terms:
```
column_meta_mapping:
  terms_list:
    match: ".*"
    operation: "add_terms"
    config:
      separator: ";"
```
Ensure that the terms_list in your dbt model’s meta section uses the correct identifiers for the terms.

If you continue to experience issues, you may need to manually verify the term URNs and ensure they are correctly referenced in your configuration.

Would you like more detailed steps or additional examples on how to address this issue?

Sources:

https://datahubspace.slack.com/archives/CUMUWQU66/p1725638977.402689|https://datahubspace.slack.com/archives/CUMUWQU66/p1725638977.402689
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/transformer/dataset_transformer.md|https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/transformer/dataset_transformer.md 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · April 7, 2025, 12:05am

<@U06TM7M40RK> how can I view an id for a glossary term

user-2 · April 7, 2025, 12:05am

To view the ID for a glossary term in DataHub, you can use the DataHub UI, CLI, or GraphQL API. Here are the steps for each method:

Using DataHub UI:
- Navigate to the glossary section in the DataHub UI.
- Search for the glossary term you are interested in.
- Once you find the term, you can view its details, including the ID, on the term’s detail page.
Using DataHub CLI:
- You can use the DataHub CLI to fetch details of a specific glossary term if you know its URN. The command would look something like this:
```
datahub get urn:li:glossaryTerm:<term-id>
```
- This command will return the details of the glossary term, including its ID.
Using GraphQL API:
- You can execute a GraphQL query to fetch the details of a glossary term, including its ID. Here is an example query:
```
query getGlossaryTerm($urn: String!) {
  glossaryTerm(urn: $urn) {
    urn
    name
    properties {
      name
      description
    }
  }
}
```
- Replace <urn> with the actual URN of the glossary term you want to query.

Would you like more detailed steps or additional examples on how to use any of these methods?

Sources:

https://datahubspace.slack.com/archives/CV2KB471C/p1704459244.212249|https://datahubspace.slack.com/archives/CV2KB471C/p1704459244.212249
https://github.com/datahub-project/datahub/blob/master/docs/managed-datahub/approval-workflows.md|https://github.com/datahub-project/datahub/blob/master/docs/managed-datahub/approval-workflows.md 0 button 0 button Hint: Mention @runllm in the thread for followups.

datahub_team · April 7, 2025, 12:05am

Hi ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look

Hope this helps ! ><

Topic		Replies	Views
Associating Business Glossary Terms with Columns in DataHub through API and CLI ingestion	4	19	February 3, 2025
Injecting a Glossary from Excel into DataHub via API: Steps and Examples ingestion	3	22	December 9, 2024
Automating the Mapping of Business Glossary Terms to DataHub Entities through CLI and Python Code ingestion	6	86	April 15, 2024
Difficulty Removing Glossary Terms Assigned from Classification on SQL Sources in DataHub Version 0.13.1 ui	2	18	July 1, 2024
Mass Deleting Glossary Terms using Datahub CLI ui	2	26	June 24, 2024

Managing Glossary Terms in DataHub

Related topics