Hi team — we have a strict network policy for Snowflake — users have to be in our office or on a VPN to log into Snowflake (UI or programmatic). And we use <https://docs.snowflake.com/en/user-guide/admin-security-privatelink|AWS privatelink> to ensure that we whitelist the IPs of third party services. Services that log into Snowflake via privatelink need to use a different URL than those on the VPN.
We self host datahub on K8s, and in order to create a Snowflake integration in DataHub we need to use a PrivateLink URL. However, this means that the URL produced to link out to Snowflake data sources doesn’t work for individuals. Is there any way around this?
Hey Peter, I see that’s a tricky one. I don’t think there’s out-of-box ways to override this behavior right now.
Your best bet would be to override the behavior on this deployment itself.
This could be done by extending the snowflake source in metadata-ingestion.
I see the metadata-ingestion project has a SnowflakeSchemaGenerator class with a handful of methods like get_external_url_for_XXX. Those are the methods that generate those external links during ingestion time.
The snowsight_base_url used within those is pulled in from this method in snowflake_utils.py: create_snowsight_base_url(..., privatelink: bool)
If you’re able to extend the snowflake source, some solutions could be to:
override this method
or, in the parent caller get_snowsight_base_url(...) pass in the parameters that will generate the url that’ll work for your users
If scripting is easier than extending the ingestion source, I think another option could be to:
Turn off the include_external_url flag in the snowflake source config
Write a script that runs on a schedule to add the externalUrl field to the datasetProperties aspect for all snowflake tables (<DataHub CLI | DataHub updating aspects api docs>).