Understanding the 'dev' Environment in DataHub Configuration

Original Slack Thread

(urn:li:dataPlatform:dbt,dev.catalog.interim.schema,PROD)']},

can anyone tell what does this dev. means ? And where is it configured

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To better assist you, could you please provide the following details:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub (e.g., BigQuery)?

Based on your provided urn (urn:li:dataPlatform:dbt,dev.catalog.interim.schema,PROD), it appears that the dev part likely refers to the environment or platform instance in which your data resides.

Explanation:

  • urn:li:dataPlatform:dbt: This indicates that the data platform is dbt.
  • dev.catalog.interim.schema: This part typically represents the environment (dev), catalog, and schema.
  • PROD: This indicates the environment tag, which is PROD.

Configuration:

  • Environment (env): Historically used to represent the environment such as DEV, ACC, PROD, etc. However, it is being deprecated in favor of platform_instance.
  • Platform Instance (platform_instance): Represents specific instances of a platform within an environment. For example, you might have multiple instances within the PROD environment.

Deprecation Note:
According to <Slack Slack Discussion>, the env parameter is being deprecated in favor of platform_instance to provide more flexibility and handle more complex scenarios. However, env will stay for a while and is not being removed immediately.

Example Configuration:
Here is an example of how you might configure a dbt ingestion recipe using platform_instance:

source:
  type: dbt
  config:
    platform_instance: "core_finance_dev"
    target_platform: "bigquery"
    env: "DEV"  # You can still specify this for now, but plan to transition away from it
    # other dbt-specific configurations
sink:
  type: datahub-rest
  config:
    server: '${DATAHUB_GMS_HOST}'
    token: '${DATAHUB_GMS_TOKEN}'

Would you like more detailed guidance on how to transition from env to platform_instance or any other specific aspect?

Sources: