Adding Custom Properties to a Dataset in DataHub using Transformation Recipe

Original Slack Thread

<@U06TM7M40RK> how to add custom properties to dataset in ingestion receipe

To add custom properties to a dataset during ingestion in DataHub, you can use the simple_add_dataset_properties transformer. This transformer allows you to specify key-value pairs that will be added as custom properties to the dataset. Here’s how you can configure it in your ingestion recipe YAML file:

  1. Basic Configuration: Add custom properties to a dataset.
  2. Replace Existing Properties: Replace existing dataset properties sent by the ingestion source.
  3. Overwrite Existing Properties: Overwrite the dataset properties available for the dataset on DataHub GMS.
  4. Patch Existing Properties: Keep the existing dataset properties and add new ones.

Basic Configuration

transformers:
  - type: "simple_add_dataset_properties"
    config:
      properties:
        prop1: value1
        prop2: value2

Replace Existing Properties

transformers:
  - type: "simple_add_dataset_properties"
    config:
      replace_existing: true  # false is the default behavior
      properties:
        prop1: value1
        prop2: value2

Overwrite Existing Properties

transformers:
  - type: "simple_add_dataset_properties"
    config:
      semantics: OVERWRITE  # OVERWRITE is the default behavior
      properties:
        prop1: value1
        prop2: value2

Patch Existing Properties

transformers:
  - type: "simple_add_dataset_properties"
    config:
      semantics: PATCH
      properties:
        prop1: value1
        prop2: value2

These configurations will add the specified properties to the dataset during the ingestion process.

Would you like more detailed instructions or have any specific questions about this process?

Sources:

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

I’m not able to find any data sources to answer your question. Could you provide more details about the specific issue you are facing or the type of assistance you need with DataHub? If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.