Hello team! I have the following question: I know that tags cannot be attached to glossary terms by default. Can you tell me if I can somehow add the entity tags to the glossary terms so that I can bind glossary terms to specific tags?
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
Are you using UI or CLI for ingestion?
Which DataHub version are you using? (e.g. 0.12.0)
What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
To address your question, it’s true that by default, tags cannot be directly attached to glossary terms in DataHub. However, there are some workarounds and related features that might help you achieve similar functionality.
Current Capabilities and Workarounds
Tags and Glossary Terms:
Tags and glossary terms are different entities in DataHub. Tags are informal labels used for search and discovery, while glossary terms are part of a controlled vocabulary with optional hierarchy (https://datahubproject.io/docs/0.13.1/tags|source).
Creating a template for documentation in DataHub is a common request, but as of now, DataHub does not support predefined templates for documentation directly. However, there are ways to streamline the documentation process using Documentation Forms.
Documentation Forms
Documentation Forms in DataHub allow you to define a set of questions (prompts) that need to be answered for a data asset to be considered properly documented. These forms can be configured and assigned to specific assets or groups of assets.
Here’s how you can create and assign a Documentation Form:
Define the Documentation Form in YAML:
You can define the form with specific questions that need to be answered. These questions can be required or optional and can be assigned at the asset-level or field-level.
- id: 123456
type: DOCUMENTATION # Supported Types: DOCUMENTATION, VERIFICATION
name: "Metadata Initiative 2024"
description: "Ensure important data assets have all required metadata"
prompts:
- id: "123"
title: "Data Retention Time"
description: "Apply Retention Time structured property to form"
type: STRUCTURED_PROPERTY
structured_property_id: io.acryl.privacy.retentionTime
required: True
entities:
urns:
- urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)
actors:
users:
- urn:li:corpuser:jane@email.com
- urn:li:corpuser:john@email.com
groups:
- urn:li:corpGroup:team@email.com
Assign the Form to Assets:
You can assign the form to specific assets using their URNs or by using filters such as type, platform, domain, or container.
Completion and Verification:
Once assigned, asset owners or specific users/groups can complete the form. Verification forms require a final sign-off to ensure all information is correct.
As per current capabilities, there is no direct support for reusable templates for documentation that can be applied across multiple datasets without manually defining them each time.
There is no built-in feature to import/export these forms as templates for reuse.
Would you like more detailed steps or examples on setting up and using Documentation Forms in DataHub?
To create and manage forms.yml using only the CLI on a Docker server, you can follow these steps:
Steps to Create and Manage forms.yml using CLI
Create the forms.yml file:
You can use a command-line text editor like nano or vi to create and edit the forms.yml file directly on your server.
nano forms.yml
Or
vi forms.yml
Then, you can add the content for your form. For example:
- id: 123456
type: DOCUMENTATION # Supported Types: DOCUMENTATION, VERIFICATION
name: "Metadata Initiative 2024"
description: "Ensure important data assets have all required metadata"
prompts:
- id: "123"
title: "Data Retention Time"
description: "Apply Retention Time structured property to form"
type: STRUCTURED_PROPERTY
structured_property_id: io.acryl.privacy.retentionTime
required: True
entities:
urns:
- urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)
actors:
users:
- urn:li:corpuser:jane@email.com
- urn:li:corpuser:john@email.com
groups:
- urn:li:corpGroup:team@email.com
Upload the Form to DataHub:
Once you have created your forms.yml, you can use the DataHub CLI to upload the form.
datahub forms upsert -f forms.yml
This command will upsert the form into DataHub. If successful, you should see a message like Created form urn:li:form:....
Verify the Form:
You can verify that the form has been created by fetching it using its URN.
datahub forms get --urn urn:li:form:123456
This command will return the metadata about your form.
Example Commands
Here is a complete example workflow:
# Create the forms.yml file using nano
nano forms.yml
# Add your YAML content and save the file
# Upsert the form into DataHub
datahub forms upsert -f forms.yml
# Verify the form by fetching it
datahub forms get --urn urn:li:form:123456
To get the structured_property_id for use in your Documentation Forms, you need to first create the structured property and then retrieve its ID. Here’s how you can do it using the DataHub CLI:
Steps to Create and Retrieve structured_property_id
Create the Structured Property:
Define your structured property in a YAML file. For example, create a file named structured_properties.yml with the following content:
- id: retention_time
qualified_name: io.acryl.privacy.retentionTime
type: string # Supported types: date, string, number, urn, rich_text
cardinality: SINGLE # Supported options: SINGLE, MULTIPLE
display_name: Retention Time
description: "Retention time for the dataset"
entity_types: # Define which types of DataHub Assets the Property can be assigned to
- dataset
allowed_values:
- value: "30 Days"
description: "Retention period of 30 days"
- value: "90 Days"
description: "Retention period of 90 days"
- value: "365 Days"
description: "Retention period of 365 days"
- value: "Indefinite"
description: "Indefinite retention period"
Upload the Structured Property:
Use the DataHub CLI to upload the structured property.
This command will create the structured property in DataHub.
Retrieve the structured_property_id:
After creating the structured property, you can retrieve its ID using the CLI.
datahub properties get --urn urn:li:structuredProperty:retention_time
This command will return the details of the structured property, including its ID, which you can then use in your Documentation Forms.
Example Commands
Here is a complete example workflow:
# Create the structured_properties.yml file using nano
nano structured_properties.yml
# Add your YAML content and save the file
# Upsert the structured property into DataHub
datahub properties upsert -f structured_properties.yml
# Retrieve the structured property ID
datahub properties get --urn urn:li:structuredProperty:retention_time
The error message “Form type DOCUMENTATION does not exist. Please try again with a valid type.” suggests that the form type you are using is not recognized. According to the available data sources, the valid form types are VERIFICATION and COMPLETION.
Here’s how you can correct the form type in your YAML configuration:
Correcting the Form Type
Update the Form Type:
Change the form type from DOCUMENTATION to COMPLETION.
- id: 123456
type: COMPLETION # Valid Types: COMPLETION, VERIFICATION
name: "Metadata Initiative 2024"
description: "Ensure important data assets have all required metadata"
prompts:
- id: "123"
title: "Data Retention Time"
description: "Apply Retention Time structured property to form"
type: STRUCTURED_PROPERTY
structured_property_id: io.acryl.privacy.retentionTime
required: True
entities:
urns:
- urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)
actors:
users:
- urn:li:corpuser:jane@email.com
- urn:li:corpuser:john@email.com
groups:
- urn:li:corpGroup:team@email.com
Upload the Corrected Form:
Use the DataHub CLI to upload the corrected form.
datahub forms upsert -f forms.yml
Additional Steps
If you encounter any further issues, ensure that:
The structured property ID is correctly defined and exists.
The YAML syntax is correct.
Example Commands
Here is the complete example workflow with the corrected form type:
# Create or edit the forms.yml file using nano
nano forms.yml
# Add your YAML content and save the file
# Upsert the form into DataHub
datahub forms upsert -f forms.yml
# Verify the form by fetching it
datahub forms get --urn urn:li:form:123456
The “Awaiting Documentation” status in the UI indicates that the documentation form you created has been assigned to the entity, but it has not yet been completed. Here are the steps you can take to complete the documentation and update the status:
Steps to Complete the Documentation Form
Access the DataHub UI:
Open your web browser and navigate to the DataHub UI (usually at http://localhost:9002 if running locally).
Navigate to the Entity:
Find the entity that has the “Awaiting Documentation” status. This is typically a dataset or another type of data asset.
Complete the Documentation Form:
Go to the Documentation or Forms tab for the entity. You should see the assigned form with the prompts/questions that need to be completed.
Fill out all the required fields in the form. These fields are based on the prompts you defined in your forms.yml.
Submit the Form:
After completing all the required fields, submit the form. This should update the status from “Awaiting Documentation” to indicate that the documentation is complete.
Example Workflow
Here is a step-by-step example:
Open DataHub UI:
Navigate to http://localhost:9002 in your web browser.
Find the Entity:
Search for the entity (e.g., a dataset) that shows “Awaiting Documentation”.
Complete the Form:
Go to the Documentation or Forms tab.
Fill out the required fields as defined in your form. For example, if your form includes a prompt for “Data Retention Time”, make sure to provide the necessary information.
Submit the Documentation:
Once all required fields are filled out, submit the form.
<@U06TM7M40RK> i dont see anything in UI like what you describe in Complete the Documentation Form:
Go to the Documentation or Forms tab for the entity. You should see the assigned form with the prompts/questions that need to be completed.
Fill out all the required fields in the form. These fields are based on the prompts you defined in your forms.yml. I just have a sentence “Awaiting Documentation”
I’m not able to find any data sources to answer your question.
Could you specify the version of DataHub you are using and any specific configurations you have applied?
If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button 0 button Hint: Mention @runllm in the thread for followups.
I’m not able to find any data sources to answer your question.
Could you provide more details about your current DataHub setup, such as the version you are using and any specific configurations or customizations you have applied?
If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button 0 button Hint: Mention @runllm in the thread for followups.