Requirements and Considerations for Implementing DataHub with Azure and Snowflake

Original Slack Thread

<@U06TM7M40RK>, this week we have a workshop to implement DataHub in our system. We use Azure as Cloud and Snowflake for our Database.
What do we have to pay attention to what are the requirments for DataHub can be implemented in our company without any problems

To implement DataHub in your system using Azure as your cloud provider and Snowflake as your database, there are several key considerations and requirements to ensure a smooth deployment:

  1. Deployment on Azure:

    • Kubernetes Setup: It is recommended to deploy DataHub on Azure Kubernetes Service (AKS) for scalability and reliability. You can follow the <https://datahubproject.io/docs/0.14.0/deploy/azure|Azure deployment guide> to set up a Kubernetes cluster and deploy DataHub using Helm charts.
    • Resource Configuration: Ensure your AKS cluster has sufficient resources. A typical setup might include 3 worker nodes with specifications like Standard_D4s_v3 (4 vCPUs, 16 GiB Memory) as mentioned in the <Slack discussion>.
  2. Snowflake Integration:

  3. Security and Access:

  4. DataHub Components:

    • Frontend and Backend Configuration: Ensure that the DataHub frontend is properly configured and exposed via an ingress controller. Use PostgreSQL as the storage layer for DataHub, as recommended in the <Slack discussion>.
  5. Monitoring and Maintenance:

    • Resource Monitoring: Regularly monitor the performance and resource usage of your DataHub deployment to ensure it meets your organizational needs.

Would you like more detailed steps or specific configurations for any of these components?

Sources:

  • <Slack discussion on production deployment> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

Can you let me more information regarding this. Which service I need , which permissions are required in Azure site and what in Snowflake?The links above doesn’t work

In Azure , which permission needed and which service I need for DataHub ?

<@U06TM7M40RK> In Azure , which permission needed and which service I need for DataHub ?

To implement DataHub on Azure, you will need to set up certain services and ensure that you have the appropriate permissions. Here’s a detailed breakdown:

Azure Services Required

  1. Azure Kubernetes Service (AKS):

    • Purpose: AKS is used to deploy and manage the DataHub application in a scalable and reliable manner.
    • Setup: You can create an AKS cluster using the Azure CLI. Ensure that you have the kubectl and helm tools installed to manage Kubernetes resources and deploy applications using Helm charts.
  2. Azure Active Directory (Azure AD):

    • Purpose: Azure AD is used for managing user identities and enabling Single Sign-On (SSO) for DataHub.
    • Setup: You will need to create an Azure AD application for DataHub to handle authentication and authorization.
  3. Azure Database for PostgreSQL (optional):

    • Purpose: If you choose to use PostgreSQL as the backend database for DataHub, you can use Azure Database for PostgreSQL to host it.

Permissions Required

  1. Azure Kubernetes Service (AKS):

    • Permissions: You need permissions to create and manage Kubernetes clusters. This typically requires the Contributor role on the Azure subscription or resource group where the AKS cluster will be created.
  2. Azure Active Directory (Azure AD):

    • Permissions: You need permissions to create and manage applications in Azure AD. This typically requires the Application Administrator or Global Administrator role in Azure AD.
  3. Azure Resource Management:

    • Permissions: You need permissions to create and manage resources such as resource groups, virtual networks, and other infrastructure components. This typically requires the Contributor role on the Azure subscription or resource group.

Additional Considerations

  • Managed Identity: Consider using Azure Managed Identity for secure and seamless authentication to Azure services without managing credentials.
  • Network Configuration: Ensure that your AKS cluster is properly networked and secured, possibly using Azure Virtual Network and Network Security Groups.

Would you like more detailed steps on setting up any of these services or configuring permissions?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><