Storing and Retrieving Secrets in DataHub for Ingestion Source

Original Slack Thread

<@U06TM7M40RK> how datahub stores and retrieves secrets of ingestion source

DataHub provides a mechanism to securely store and retrieve secrets used in ingestion sources. Here’s a summary of how it works:

Storing Secrets

  1. Creating Secrets: Secrets are created via the DataHub UI. Navigate to the Ingestion tab, then to the Secrets tab, and click Create new secret. You can provide a unique name for the secret along with the value to be encrypted and an optional description. This process securely stores the secret within DataHub’s storage layer.

    Example:

    source:
        type: mysql
        config:
            host_port: 'localhost:3306'
            database: my_db
            username: ${MYSQL_USERNAME}
            password: ${MYSQL_PASSWORD}
            include_tables: true
            include_views: true
            profiling:
                enabled: true
    sink:
        type: datahub-rest
        config:
            server: 'http://datahub-gms:8080'
    

    In this example, ${MYSQL_USERNAME} and ${MYSQL_PASSWORD} are placeholders for secrets that have been created and stored securely.

Retrieving Secrets

  1. Referencing Secrets: Once a secret is created, it can be referenced in your ingestion recipes using variable substitution. When the ingestion source executes, DataHub resolves these secrets by substituting the placeholders with their decrypted values.

Security Considerations

  • Storage: Secrets are stored securely within DataHub’s storage layer and are not persisted to disk beyond execution time.
  • Access Control: Only DataHub users with the Manage Secrets platform privilege can retrieve plaintext secret values using the GraphQL API.
  • Environment Variables: For production use cases, sensitive configuration values can be hidden from plain view within your ingestion recipe by using environment variables.

Example Use Case

For example, if you are integrating with Looker, you would create secrets for LOOKER_CLIENT_ID and LOOKER_CLIENT_SECRET and reference them in your ingestion recipe as follows:

source:
    type: looker
    config:
        base_url: 'https://<your-looker-instance>.cloud.looker.com'
        client_id: ${LOOKER_CLIENT_ID}
        client_secret: ${LOOKER_CLIENT_SECRET}

Additional Resources

Would you like more detailed steps or have any specific questions about this process?

Sources:

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

Hello! I’m here to help with questions specifically about DataHub. Could you please provide more details about your query? Specifically:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery)?

This information will help me assist you more effectively. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.