How to Disable or Filter Tags during dbt Model Ingestion

Original Slack Thread

Hello everyone. I’m currently in the process of doing a POC and one of the things I’m try to do is either disable ingesting tags on dbt models, or at the very least provide a list of tags that we want. We have a ton of tags that are used for scheduling that just will confuse end users who are trying to do data discovery. I know for Snowflake ingest, there’s an attribute you can put in the recipe file, but I don’t see a similar one for dbt.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

Hi there! To help you more effectively, could you please provide the following information:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery)?

Regarding your query about disabling or filtering tags during dbt ingestion, you can achieve this by configuring the meta_mapping section in your ingestion recipe. This allows you to control which tags are added based on specific meta properties in your dbt models.

For example, you can use the meta_mapping configuration to specify only the tags you want to include:

meta_mapping:
  important_tag:
    match: ".*"
    operation: "add_tag"
    config:
      tag: "important_tag"
  another_tag:
    match: ".*"
    operation: "add_tag"
    config:
      tag: "another_tag"

This configuration will ensure that only the specified tags (important_tag and another_tag) are added to your dbt models.

For more details, you can refer to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/dbt/dbt.md|dbt meta automated mappings> documentation.

Would you like more detailed instructions or have any other specific questions?

Sources:

<@U06TM7M40RK> That did not work.

Thanks for the feedback. Let’s dive deeper to address your specific needs. It seems like you want to either disable the ingestion of tags on dbt models or filter them to include only a specific list of tags. Unfortunately, the provided data sources do not explicitly mention a direct way to disable the ingestion of tags or provide a list of tags to include.

However, you can use the meta_mapping and column_meta_mapping configurations to control the ingestion of tags based on specific conditions. Here’s how you can approach this:

  1. Disable Ingestion of Tags: To effectively disable the ingestion of tags, you can avoid defining any add_tag operations in your meta_mapping and column_meta_mapping configurations.

  2. Include Only Specific Tags: If you want to include only specific tags, you can define meta_mapping rules that match only those tags. Here’s an example:

meta_mapping:
  important_tag:
    match: ".*"
    operation: "add_tag"
    config:
      tag: "important_tag"
  another_tag:
    match: ".*"
    operation: "add_tag"
    config:
      tag: "another_tag"
column_meta_mapping:
  # Define column-level tag mappings if needed
  1. Filter Out Unwanted Tags: If you have tags that you don’t want to include, you can use regex patterns in your meta_mapping to exclude them. However, this approach might require you to list all unwanted tags explicitly.

Here’s a more detailed example of how you can configure your meta_mapping:

meta_mapping:
  # Include only specific tags
  important_tag:
    match: "important_tag"
    operation: "add_tag"
    config:
      tag: "important_tag"
  another_tag:
    match: "another_tag"
    operation: "add_tag"
    config:
      tag: "another_tag"
  # Exclude unwanted tags by not defining them here

If these configurations still do not meet your requirements, you might need to customize the ingestion logic or extend the dbt ingestion source to better handle your specific use case.

Would you like more detailed guidance on customizing the ingestion logic or any other specific aspect?

Sources: