Troubleshooting Snowflake Integration with DataHub through YAML Configuration

Original Slack Thread

Hello everyone, I have a question regarging the integration between snowflake and datahub, I have datahub deployed in my computer and the only way to connect snowflake is using an external browser authentification so far for my POC, but I have tried many yaml and I cannot find a way to connect. I am using CLI for ingetion and the last version of datahub source:
type: snowflake
config:
account_id: “”
username: “”
authenticator: “EXTERNAL_BROWSER_AUTHENTICATOR”
role: “”
warehouse: “”
include_table_lineage: true
include_usage_stats: true
include_technical_schema: true
database_pattern:
allow:
- “.*”
deny:
- “^UTIL_DB$”
- “^SNOWFLAKE$”
- “^SNOWFLAKE_SAMPLE_DATA$”
ignoreCase: true
ignore_start_time_lineage: false
upstream_lineage_in_report: false

sink:
type: datahub-rest
config:
server: “http://localhost:8080

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To connect Snowflake to DataHub using external browser authentication, you need to ensure that your YAML configuration is correctly set up. Based on the provided information and the available data sources, here is a step-by-step guide to help you configure your YAML file correctly:

Summary
You need to use the EXTERNAL_BROWSER_AUTHENTICATOR for Snowflake authentication in your YAML configuration. Ensure that all required fields are correctly filled out and that the DataHub server URL is accurate.

Step-by-Step Guide

  1. Ensure Correct Configuration: Make sure your YAML configuration includes all necessary fields and is correctly formatted.

  2. Example YAML Configuration:

    source:
      type: snowflake
      config:
        account_id: "<your-snowflake-account-id>"
        username: "<your-username>"
        authenticator: "EXTERNAL_BROWSER_AUTHENTICATOR"
        role: "<your-role>"
        warehouse: "<your-warehouse>"
        include_table_lineage: true
        include_usage_stats: true
        include_technical_schema: true
        database_pattern:
          allow:
            - ".*"
          deny:
            - "^UTIL_DB$"
            - "^SNOWFLAKE$"
            - "^SNOWFLAKE_SAMPLE_DATA$"
          ignoreCase: true
        ignore_start_time_lineage: false
        upstream_lineage_in_report: false
    
    sink:
      type: datahub-rest
      config:
        server: "http://localhost:8080"
    
  3. Check for Common Issues:

    • Ensure that the account_id, username, role, and warehouse fields are correctly filled with your Snowflake account details.
    • Verify that the server URL in the sink section points to your DataHub instance.
  4. Testing the Connection:

    • After configuring the YAML file, test the connection to ensure that the credentials and configurations are correct.

Additional Resources

Sources:

<@U06TM7M40RK> I am getting the following error: using authenticator type ‘DEFAULT_AUTHENTICATOR’
Failed to configure the source (snowflake): 1 validation error for SnowflakeV2Config
authenticator
extra fields not permitted (type=value_error.extra)

The error you’re encountering indicates that the authenticator field is not recognized in the configuration. Instead, you should use the authentication_type field to specify the type of authenticator.

Corrected YAML Configuration
Replace the authenticator field with authentication_type in your YAML configuration. Here is the corrected version:

source:
  type: snowflake
  config:
    account_id: "&lt;your-snowflake-account-id&gt;"
    username: "&lt;your-username&gt;"
    authentication_type: "EXTERNAL_BROWSER_AUTHENTICATOR"
    role: "&lt;your-role&gt;"
    warehouse: "&lt;your-warehouse&gt;"
    include_table_lineage: true
    include_usage_stats: true
    include_technical_schema: true
    database_pattern:
      allow:
        - ".*"
      deny:
        - "^UTIL_DB$"
        - "^SNOWFLAKE$"
        - "^SNOWFLAKE_SAMPLE_DATA$"
      ignoreCase: true
    ignore_start_time_lineage: false
    upstream_lineage_in_report: false

sink:
  type: datahub-rest
  config:
    server: "<http://localhost:8080>"

Explanation

  1. Field Correction: The authenticator field is not valid. Use authentication_type instead.
  2. Configuration: Ensure all other fields are correctly filled with your Snowflake and DataHub details.

Additional Steps

  1. Verify Configuration: Double-check that all required fields are correctly filled.
  2. Test Connection: Run the ingestion job again to see if the issue is resolved.

Sources:

<@U06TM7M40RK> i am getting this error: Command failed: Failed to configure the source (snowflake): (‘Unable to get metadata from DataHub’, {‘message’: '401 Client Error: Unauthorized for url: http://localhost:8080/api/graphql’})

The 401 Client Error: Unauthorized indicates that your DataHub instance is rejecting the request due to missing or incorrect authentication credentials. To resolve this, you need to provide a valid access token for DataHub.

Steps to Resolve the Issue

  1. Generate a Personal Access Token:

  2. Update Your YAML Configuration:

    • Add the generated token to your YAML configuration under the sink section.

Example YAML Configuration with Access Token

source:
  type: snowflake
  config:
    account_id: "&lt;your-snowflake-account-id&gt;"
    username: "&lt;your-username&gt;"
    authentication_type: "EXTERNAL_BROWSER_AUTHENTICATOR"
    role: "&lt;your-role&gt;"
    warehouse: "&lt;your-warehouse&gt;"
    include_table_lineage: true
    include_usage_stats: true
    include_technical_schema: true
    database_pattern:
      allow:
        - ".*"
      deny:
        - "^UTIL_DB$"
        - "^SNOWFLAKE$"
        - "^SNOWFLAKE_SAMPLE_DATA$"
      ignoreCase: true
    ignore_start_time_lineage: false
    upstream_lineage_in_report: false

sink:
  type: datahub-rest
  config:
    server: "<http://localhost:8080>"
    token: "&lt;your-personal-access-token&gt;"

Additional Steps

  1. Verify Token: Ensure that the token you generated has the necessary permissions to perform the required actions.
  2. Test Connection: Run the ingestion job again to verify that the issue is resolved.

Sources:

Can you follow the above answer? I think it should fix your issue

Yes, the above answer fixed the issue. Thanks