<@U06TM7M40RK>, this week we have a workshop to implement DataHub in our system. We use Azure as Cloud and Snowflake for our Database.
What do we have to pay attention to what are the requirments for DataHub can be implemented in our company without any problems
To implement DataHub in your system using Azure as your cloud provider and Snowflake as your database, there are several key considerations and requirements to ensure a smooth deployment:
-
Deployment on Azure:
- Kubernetes Setup: It is recommended to deploy DataHub on Azure Kubernetes Service (AKS) for scalability and reliability. You can follow the <https://datahubproject.io/docs/0.14.0/deploy/azure|Azure deployment guide> to set up a Kubernetes cluster and deploy DataHub using Helm charts.
- Resource Configuration: Ensure your AKS cluster has sufficient resources. A typical setup might include 3 worker nodes with specifications like Standard_D4s_v3 (4 vCPUs, 16 GiB Memory) as mentioned in the <Slack discussion>.
-
Snowflake Integration:
- User and Role Setup: You need to create a DataHub-specific role in Snowflake with the necessary privileges to read metadata. This includes granting
operate
,usage
, andselect
privileges on the warehouse, database, and schemas. Detailed steps can be found in the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/snowflake#prerequisites|Snowflake prerequisites guide>. - Ingestion Configuration: Configure the Snowflake ingestion source in DataHub by setting up a connection using the DataHub UI or CLI. Ensure you have the correct credentials and permissions configured as per the <https://datahubproject.io/docs/0.14.0/quick-ingestion-guides/snowflake/configuration|Snowflake configuration guide>.
- User and Role Setup: You need to create a DataHub-specific role in Snowflake with the necessary privileges to read metadata. This includes granting
-
Security and Access:
- SSO Integration: Consider integrating Single Sign-On (SSO) with Azure Active Directory for authentication. This can be set up using Azure AD OIDC as described in the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/azure-ad|Azure AD documentation>.
-
DataHub Components:
- Frontend and Backend Configuration: Ensure that the DataHub frontend is properly configured and exposed via an ingress controller. Use PostgreSQL as the storage layer for DataHub, as recommended in the <Slack discussion>.
-
Monitoring and Maintenance:
- Resource Monitoring: Regularly monitor the performance and resource usage of your DataHub deployment to ensure it meets your organizational needs.
Would you like more detailed steps or specific configurations for any of these components?
Sources:
- <https://datahubproject.io/docs/0.14.0/deploy/azure|Azure deployment guide>
- <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/snowflake#prerequisites|Snowflake prerequisites guide>
- <https://datahubproject.io/docs/0.14.0/quick-ingestion-guides/snowflake/configuration|Snowflake configuration guide>
- <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/azure-ad|Azure AD documentation>
- <Slack discussion on production deployment>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Can you let me more information regarding this. Which service I need , which permissions are required in Azure site and what in Snowflake?The links above doesn’t work
In Azure , which permission needed and which service I need for DataHub ?
<@U06TM7M40RK> In Azure , which permission needed and which service I need for DataHub ?
To implement DataHub on Azure, you will need to set up certain services and ensure that you have the appropriate permissions. Here’s a detailed breakdown:
Azure Services Required
-
Azure Kubernetes Service (AKS):
- Purpose: AKS is used to deploy and manage the DataHub application in a scalable and reliable manner.
- Setup: You can create an AKS cluster using the Azure CLI. Ensure that you have the
kubectl
andhelm
tools installed to manage Kubernetes resources and deploy applications using Helm charts.
-
Azure Active Directory (Azure AD):
- Purpose: Azure AD is used for managing user identities and enabling Single Sign-On (SSO) for DataHub.
- Setup: You will need to create an Azure AD application for DataHub to handle authentication and authorization.
-
Azure Database for PostgreSQL (optional):
- Purpose: If you choose to use PostgreSQL as the backend database for DataHub, you can use Azure Database for PostgreSQL to host it.
Permissions Required
-
Azure Kubernetes Service (AKS):
- Permissions: You need permissions to create and manage Kubernetes clusters. This typically requires the
Contributor
role on the Azure subscription or resource group where the AKS cluster will be created.
- Permissions: You need permissions to create and manage Kubernetes clusters. This typically requires the
-
Azure Active Directory (Azure AD):
- Permissions: You need permissions to create and manage applications in Azure AD. This typically requires the
Application Administrator
orGlobal Administrator
role in Azure AD.
- Permissions: You need permissions to create and manage applications in Azure AD. This typically requires the
-
Azure Resource Management:
- Permissions: You need permissions to create and manage resources such as resource groups, virtual networks, and other infrastructure components. This typically requires the
Contributor
role on the Azure subscription or resource group.
- Permissions: You need permissions to create and manage resources such as resource groups, virtual networks, and other infrastructure components. This typically requires the
Additional Considerations
- Managed Identity: Consider using Azure Managed Identity for secure and seamless authentication to Azure services without managing credentials.
- Network Configuration: Ensure that your AKS cluster is properly networked and secured, possibly using Azure Virtual Network and Network Security Groups.
Would you like more detailed steps on setting up any of these services or configuring permissions?
Sources:
- <https://datahubproject.io/docs/0.14.0/deploy/azure|Azure deployment guide>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><