Understanding and Implementing the `__DATAHUB_TO_FILE_` Directive in DataHub

Original Slack Thread

<@U06TM7M40RK> how to use _DATAHUB_TO_FILE

The __DATAHUB_TO_FILE_ directive in DataHub is used to handle sensitive data by loading it as files within ingestion recipes. This is particularly useful for data sources that require file paths on a local file system, such as Kafka, BigQuery, and MySQL, especially in scenarios where the ingestion process needs to be self-sufficient, like UI ingestion.

How to Use __DATAHUB_TO_FILE_

  1. Syntax: The directive is used in the format __DATAHUB_TO_FILE_<property>: <value>. This will be converted into <property>: <path to file containing value>. The value can be specified inline or using an environment variable or secret.

  2. Example Configuration:

    • Suppose you need to use SSL with MySQL and have a sensitive key that should not be hard-coded in the recipe. You can use the directive as follows:
    source:
      type: mysql
      config:
        host_port: localhost:3306
        database: dbname
        username: root
        password: example
        options:
          connect_args:
            __DATAHUB_TO_FILE_ssl_key: '${secret}'  # This will be converted to a file path
    

    In this example, the ssl_key is specified using the __DATAHUB_TO_FILE_ directive, which will convert the secret into a file path where the contents of the secret are stored.

  3. Purpose: This approach enhances security and flexibility by ensuring that sensitive data is not hard-coded in the recipes but is instead managed through file paths, which can be dynamically generated and managed.

Would you like more detailed steps or additional information on any specific part of the process?

Sources:

Hi <@U08PVDCPJAG>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.