Adding Entity Tags to Glossary Terms in DataHub

Original Slack Thread

Hello team! I have the following question: I know that tags cannot be attached to glossary terms by default. Can you tell me if I can somehow add the entity tags to the glossary terms so that I can bind glossary terms to specific tags?

I will be glad of any information that will help!

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

Hello! :wave:

To address your question, it’s true that by default, tags cannot be directly attached to glossary terms in DataHub. However, there are some workarounds and related features that might help you achieve similar functionality.

Current Capabilities and Workarounds

  1. Tags and Glossary Terms:

    • Tags and glossary terms are different entities in DataHub. Tags are informal labels used for search and discovery, while glossary terms are part of a controlled vocabulary with optional hierarchy (https://datahubproject.io/docs/0.13.1/tags|source).
  2. Using Domains:

  3. Feature Requests and Contributions:

  4. Actions Framework:

Would you like more detailed information on any of these points or guidance on how to set up a specific workaround?

Sources:

<@U06TM7M40RK> OK, I have another question: Can I create a template to fill out the documentation in the dataset?

Creating a template for documentation in DataHub is a common request, but as of now, DataHub does not support predefined templates for documentation directly. However, there are ways to streamline the documentation process using Documentation Forms.

Documentation Forms

Documentation Forms in DataHub allow you to define a set of questions (prompts) that need to be answered for a data asset to be considered properly documented. These forms can be configured and assigned to specific assets or groups of assets.

Here’s how you can create and assign a Documentation Form:

  1. Define the Documentation Form in YAML:

    • You can define the form with specific questions that need to be answered. These questions can be required or optional and can be assigned at the asset-level or field-level.
    - id: 123456
      type: DOCUMENTATION  # Supported Types: DOCUMENTATION, VERIFICATION
      name: "Metadata Initiative 2024"
      description: "Ensure important data assets have all required metadata"
      prompts: 
        - id: "123"
          title: "Data Retention Time"
          description: "Apply Retention Time structured property to form"
          type: STRUCTURED_PROPERTY
          structured_property_id: io.acryl.privacy.retentionTime
          required: True
      entities: 
        urns:
          - urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)
      actors: 
        users:
          - urn:li:corpuser:jane@email.com
          - urn:li:corpuser:john@email.com
        groups:
          - urn:li:corpGroup:team@email.com
    
  2. Assign the Form to Assets:

    • You can assign the form to specific assets using their URNs or by using filters such as type, platform, domain, or container.
  3. Completion and Verification:

    • Once assigned, asset owners or specific users/groups can complete the form. Verification forms require a final sign-off to ensure all information is correct.

For more details on creating and using Documentation Forms, you can refer to the <https://github.com/datahub-project/datahub/blob/master/docs/features/feature-guides/documentation-forms.md|DataHub Documentation Forms Guide>.

Limitations

  • As per current capabilities, there is no direct support for reusable templates for documentation that can be applied across multiple datasets without manually defining them each time.
  • There is no built-in feature to import/export these forms as templates for reuse.

Would you like more detailed steps or examples on setting up and using Documentation Forms in DataHub?

Sources:

<@U06TM7M40RK> how to init file forms.yml if i have only cli of docker server and not ide

To create and manage forms.yml using only the CLI on a Docker server, you can follow these steps:

Steps to Create and Manage forms.yml using CLI

  1. Create the forms.yml file:

    • You can use a command-line text editor like nano or vi to create and edit the forms.yml file directly on your server.
    nano forms.yml
    

    Or

    vi forms.yml
    

    Then, you can add the content for your form. For example:

    - id: 123456
      type: DOCUMENTATION  # Supported Types: DOCUMENTATION, VERIFICATION
      name: "Metadata Initiative 2024"
      description: "Ensure important data assets have all required metadata"
      prompts: 
        - id: "123"
          title: "Data Retention Time"
          description: "Apply Retention Time structured property to form"
          type: STRUCTURED_PROPERTY
          structured_property_id: io.acryl.privacy.retentionTime
          required: True
      entities: 
        urns:
          - urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)
      actors: 
        users:
          - urn:li:corpuser:jane@email.com
          - urn:li:corpuser:john@email.com
        groups:
          - urn:li:corpGroup:team@email.com
    
  2. Upload the Form to DataHub:

    • Once you have created your forms.yml, you can use the DataHub CLI to upload the form.
    datahub forms upsert -f forms.yml
    

    This command will upsert the form into DataHub. If successful, you should see a message like Created form urn:li:form:....

  3. Verify the Form:

    • You can verify that the form has been created by fetching it using its URN.
    datahub forms get --urn urn:li:form:123456
    

    This command will return the metadata about your form.

Example Commands

Here is a complete example workflow:

# Create the forms.yml file using nano
nano forms.yml

# Add your YAML content and save the file

# Upsert the form into DataHub
datahub forms upsert -f forms.yml

# Verify the form by fetching it
datahub forms get --urn urn:li:form:123456

Additional Resources

For more information, you can refer to the <https://github.com/datahub-project/datahub/blob/master/docs/features/feature-guides/documentation-forms.md|DataHub Documentation Forms Guide> and the <https://github.com/datahub-project/datahub/blob/master/docs/api/tutorials/forms.md|DataHub CLI Documentation>.

Would you like more detailed instructions on any of these steps?

Sources:

<@U06TM7M40RK> how can i get structured_property_id:

To get the structured_property_id for use in your Documentation Forms, you need to first create the structured property and then retrieve its ID. Here’s how you can do it using the DataHub CLI:

Steps to Create and Retrieve structured_property_id

  1. Create the Structured Property:

    • Define your structured property in a YAML file. For example, create a file named structured_properties.yml with the following content:
    - id: retention_time
      qualified_name: io.acryl.privacy.retentionTime
      type: string  # Supported types: date, string, number, urn, rich_text
      cardinality: SINGLE  # Supported options: SINGLE, MULTIPLE
      display_name: Retention Time
      description: "Retention time for the dataset"
      entity_types:  # Define which types of DataHub Assets the Property can be assigned to
        - dataset
      allowed_values:
        - value: "30 Days"
          description: "Retention period of 30 days"
        - value: "90 Days"
          description: "Retention period of 90 days"
        - value: "365 Days"
          description: "Retention period of 365 days"
        - value: "Indefinite"
          description: "Indefinite retention period"
    
  2. Upload the Structured Property:

    • Use the DataHub CLI to upload the structured property.
    datahub properties upsert -f structured_properties.yml
    

    This command will create the structured property in DataHub.

  3. Retrieve the structured_property_id:

    • After creating the structured property, you can retrieve its ID using the CLI.
    datahub properties get --urn urn:li:structuredProperty:retention_time
    

    This command will return the details of the structured property, including its ID, which you can then use in your Documentation Forms.

Example Commands

Here is a complete example workflow:

# Create the structured_properties.yml file using nano
nano structured_properties.yml

# Add your YAML content and save the file

# Upsert the structured property into DataHub
datahub properties upsert -f structured_properties.yml

# Retrieve the structured property ID
datahub properties get --urn urn:li:structuredProperty:retention_time

Additional Resources

For more information, you can refer to the <https://github.com/datahub-project/datahub/blob/master/docs/features/feature-guides/properties.md|DataHub Structured Properties Guide> and the <https://github.com/datahub-project/datahub/blob/master/docs/api/tutorials/structured-properties.md|DataHub CLI Documentation>.

Would you like more detailed instructions on any of these steps?

Sources:

<@U06TM7M40RK> i got error like “Form type DOCUMENTATION does not exist. Please try again with a valid type.”

The error message “Form type DOCUMENTATION does not exist. Please try again with a valid type.” suggests that the form type you are using is not recognized. According to the available data sources, the valid form types are VERIFICATION and COMPLETION.

Here’s how you can correct the form type in your YAML configuration:

Correcting the Form Type

  1. Update the Form Type:

    • Change the form type from DOCUMENTATION to COMPLETION.
    - id: 123456
      type: COMPLETION  # Valid Types: COMPLETION, VERIFICATION
      name: "Metadata Initiative 2024"
      description: "Ensure important data assets have all required metadata"
      prompts: 
        - id: "123"
          title: "Data Retention Time"
          description: "Apply Retention Time structured property to form"
          type: STRUCTURED_PROPERTY
          structured_property_id: io.acryl.privacy.retentionTime
          required: True
      entities: 
        urns:
          - urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)
      actors: 
        users:
          - urn:li:corpuser:jane@email.com
          - urn:li:corpuser:john@email.com
        groups:
          - urn:li:corpGroup:team@email.com
    
  2. Upload the Corrected Form:

    • Use the DataHub CLI to upload the corrected form.
    datahub forms upsert -f forms.yml
    

Additional Steps

If you encounter any further issues, ensure that:

  • The structured property ID is correctly defined and exists.
  • The YAML syntax is correct.

Example Commands

Here is the complete example workflow with the corrected form type:

# Create or edit the forms.yml file using nano
nano forms.yml

# Add your YAML content and save the file

# Upsert the form into DataHub
datahub forms upsert -f forms.yml

# Verify the form by fetching it
datahub forms get --urn urn:li:form:123456

Additional Resources

For more information, you can refer to the <https://github.com/datahub-project/datahub/blob/master/docs/features/feature-guides/documentation-forms.md|DataHub Documentation Forms Guide> and the <https://github.com/datahub-project/datahub/blob/master/docs/api/tutorials/forms.md|DataHub CLI Documentation>.

Would you like more detailed instructions on any of these steps?

Sources:

<@U06TM7M40RK> ok but now in my entity in ui i see that “Awaiting Documentation” what should i do with that?

The “Awaiting Documentation” status in the UI indicates that the documentation form you created has been assigned to the entity, but it has not yet been completed. Here are the steps you can take to complete the documentation and update the status:

Steps to Complete the Documentation Form

  1. Access the DataHub UI:

    • Open your web browser and navigate to the DataHub UI (usually at http://localhost:9002 if running locally).
  2. Navigate to the Entity:

    • Find the entity that has the “Awaiting Documentation” status. This is typically a dataset or another type of data asset.
  3. Complete the Documentation Form:

    • Go to the Documentation or Forms tab for the entity. You should see the assigned form with the prompts/questions that need to be completed.
    • Fill out all the required fields in the form. These fields are based on the prompts you defined in your forms.yml.
  4. Submit the Form:

    • After completing all the required fields, submit the form. This should update the status from “Awaiting Documentation” to indicate that the documentation is complete.

Example Workflow

Here is a step-by-step example:

  1. Open DataHub UI:

    • Navigate to http://localhost:9002 in your web browser.
  2. Find the Entity:

    • Search for the entity (e.g., a dataset) that shows “Awaiting Documentation”.
  3. Complete the Form:

    • Go to the Documentation or Forms tab.
    • Fill out the required fields as defined in your form. For example, if your form includes a prompt for “Data Retention Time”, make sure to provide the necessary information.
  4. Submit the Documentation:

    • Once all required fields are filled out, submit the form.

Additional Resources

For more detailed instructions on how to use Documentation Forms, you can refer to the <https://github.com/datahub-project/datahub/blob/master/docs/features/feature-guides/documentation-forms.md|DataHub Documentation Forms Guide>.

Would you like more detailed steps or further assistance with any of these steps?

Sources:

<@U06TM7M40RK> i dont see anything in UI like what you describe in Complete the Documentation Form:

  • Go to the Documentation or Forms tab for the entity. You should see the assigned form with the prompts/questions that need to be completed.
  • Fill out all the required fields in the form. These fields are based on the prompts you defined in your forms.yml. I just have a sentence “Awaiting Documentation”

I’m not able to find any data sources to answer your question.

Could you specify the version of DataHub you are using and any specific configurations you have applied?

If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> what should i do with “Awaiting Documentation” if i dont have assigned form with the prompts/questions that need to be completed.

I’m not able to find any data sources to answer your question.

Could you provide more details about your current DataHub setup, such as the version you are using and any specific configurations or customizations you have applied?

If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.