Configuring Ingress for DataHub Deployment on AWS, Azure, and GCP

Original Slack Thread

<@U06TM7M40RK> how to enable ingress in datahub

To enable ingress in DataHub, you need to configure your deployment to expose the DataHub frontend using an ingress controller. The process can vary depending on the cloud provider you are using. Here are the general steps for AWS, Azure, and GCP:

AWS

  1. Set Up Ingress Controller: Use the AWS Application Load Balancer (ALB) Controller. You need to create an IAM policy and a service account for the controller.

  2. Update values.yaml: Modify the values.yaml file to enable ingress for the DataHub frontend. Here is an example configuration:

    datahub-frontend:
      enabled: true
      image:
        repository: acryldata/datahub-frontend-react
        tag: "latest"
      ingress:
        enabled: true
        annotations:
          kubernetes.io/ingress.class: alb
          alb.ingress.kubernetes.io/scheme: internet-facing
          alb.ingress.kubernetes.io/target-type: instance
          alb.ingress.kubernetes.io/certificate-arn: <<certificate-arn>>
          alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0
          alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
        hosts:
          - host: <<host-name>>
            paths:
              - /*
    
  3. Apply Configuration: Use Helm to apply the configuration:

    helm upgrade --install datahub datahub/datahub --values values.yaml
    
  4. Verify Setup: Check the ingress setup using:

    kubectl get ingress
    

    For more details, refer to the <https://github.com/datahub-project/datahub/blob/master/docs/deploy/aws.md|AWS Deployment Guide>.

Azure

  1. Set Up Application Gateway: Deploy an Azure Application Gateway and configure it with a WAF policy.

  2. Update values.yaml: Modify the values.yaml file to enable ingress for the DataHub frontend using Azure Application Gateway.

    datahub-frontend:
      enabled: true
      image:
        repository: acryldata/datahub-frontend-react
      ingress:
        enabled: true
        annotations:
          kubernetes.io/ingress.class: azure/application-gateway
          appgw.ingress.kubernetes.io/backend-protocol: "http"
        hosts:
          - paths:
              - /*
    
  3. Apply Configuration: Use Helm to apply the configuration:

    helm upgrade --install datahub datahub/datahub --values values.yaml
    
  4. Verify Setup: Check the ingress setup using:

    kubectl get ingress
    

For more details, refer to the <https://github.com/datahub-project/datahub/blob/master/docs/deploy/azure.md|Azure Deployment Guide>.

GCP

  1. Use GKE Ingress Controller: Set up ingress using the GKE ingress controller.

  2. Configure Ingress: Use the GCP console to create an ingress resource for the DataHub frontend.

    For more details, refer to the <Deploying to GCP | DataHub Deployment Guide>.

Would you like more detailed steps for a specific cloud provider or have any specific questions about the setup?

Sources:

Hi <@U07353YLQP8>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

<@U06TM7M40RK> how to configure ingress controller ?

To configure an ingress controller for DataHub, you need to follow specific steps depending on the cloud provider or Kubernetes environment you are using. Below are general steps for configuring an ingress controller on AWS, Azure, and GCP:

AWS

  1. Set Up AWS ALB Ingress Controller:

    • IAM Policy: Download and create an IAM policy for the ALB controller.
      curl -o iam_policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/main/docs/install/iam_policy.json
      aws iam create-policy --policy-name AWSLoadBalancerControllerIAMPolicy --policy-document file://iam_policy.json
      
    • Service Account: Create a service account and attach the IAM policy.
      eksctl create iamserviceaccount --cluster=<<cluster-name>> --namespace=kube-system --name=aws-load-balancer-controller --attach-policy-arn=arn:aws:iam::<<account-id>>:policy/AWSLoadBalancerControllerIAMPolicy --override-existing-serviceaccounts --approve
      
    • Install ALB Controller: Use Helm to install the ALB controller.
      helm repo add eks https://aws.github.io/eks-charts
      helm repo update
      helm upgrade -i aws-load-balancer-controller eks/aws-load-balancer-controller --set clusterName=<<cluster-name>> --set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller -n kube-system
      
  2. Configure Ingress Resource:

    • Update your values.yaml to enable ingress for DataHub services.
    • Apply the configuration using Helm.

For more details, refer to the <https://github.com/datahub-project/datahub/blob/master/docs/deploy/aws.md|AWS Deployment Guide>.

Azure

  1. Set Up Azure Application Gateway Ingress Controller:

    • Deploy an Azure Application Gateway and configure it with a WAF policy.
    • Use Helm to install the Application Gateway Ingress Controller.
  2. Configure Ingress Resource:

    • Update your values.yaml to enable ingress for DataHub services.
    • Apply the configuration using Helm.

For more details, refer to the <https://github.com/datahub-project/datahub/blob/master/docs/deploy/azure.md|Azure Deployment Guide>.

GCP

  1. Use GKE Ingress Controller:
    • Set up ingress using the GKE ingress controller via the GCP console.
    • Create an ingress resource for the DataHub frontend.

For more details, refer to the <Deploying to GCP | DataHub Deployment Guide>.

Would you like more detailed steps for a specific cloud provider or have any specific questions about the setup?

Sources:

<@U06TM7M40RK> how to define ingress-class name after implementing ingress object

To define an ingress-class name after implementing an ingress object, you need to specify the ingressClassName in your ingress resource configuration. This is important for directing traffic to the correct ingress controller, especially if you have multiple ingress controllers in your Kubernetes cluster.

Here’s how you can define the ingressClassName in your ingress resource:

Example Ingress Configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: datahub-frontend-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"  # This is an example; replace with your ingress class name
spec:
  ingressClassName: "nginx"  # Specify your ingress class name here
  rules:
  - host: "your-domain.com"
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: datahub-frontend
            port:
              number: 9002

Steps to Implement

  1. Identify the Ingress Class: Determine the ingress class name that corresponds to your ingress controller. Common ingress class names include nginx, alb, gce, etc., depending on the ingress controller you are using.

  2. Update Ingress Resource: Add or update the ingressClassName field in your ingress resource YAML file to specify the desired ingress class.

  3. Apply the Configuration: Use kubectl to apply the updated ingress configuration:

    kubectl apply -f your-ingress-file.yaml
    
  4. Verify the Ingress Setup: Check the ingress resource to ensure it is configured correctly:

    kubectl get ingress datahub-frontend-ingress -o yaml
    

This configuration ensures that the ingress resource uses the specified ingress class, directing traffic to the appropriate ingress controller.

Would you like more detailed steps or have any specific questions about the setup?

Sources:

How to enable Enable OIDC in datahub web app

<@U06TM7M40RK> How to Enable OIDC in datahub web app

To enable OpenID Connect (OIDC) in the DataHub web application, you need to configure the DataHub frontend to use OIDC for authentication. Here are the steps to set up OIDC integration:

Steps to Enable OIDC in DataHub

  1. Register an Application with Your Identity Provider:

    • You need to register an application with your identity provider (e.g., Google, Okta, Azure AD) to obtain the Client ID, Client Secret, and Discovery URI.
    • For Google, navigate to the Google API Console, create a new project, and configure the OAuth2.0 consent screen and credentials.
    • For Okta, log in to your Okta admin account, navigate to the developer console, and create a new application with OpenID Connect as the sign-on method.
    • For Azure, use the Azure portal to register a new application and configure the redirect URI.
  2. Configure DataHub Frontend:

    • Update the datahub-frontend configuration with the necessary environment variables to enable OIDC authentication.

    Example configuration in values.yaml:

    extraEnvs:
      - name: AUTH_OIDC_ENABLED
        value: "true"
      - name: AUTH_OIDC_CLIENT_ID
        value: "<client-id>"
      - name: AUTH_OIDC_CLIENT_SECRET
        valueFrom:
          secretKeyRef:
            name: datahub-oidc-secrets
            key: client_secret
      - name: AUTH_OIDC_DISCOVERY_URI
        value: "https://<your-identity-provider>/.well-known/openid-configuration"
      - name: AUTH_OIDC_BASE_URL
        value: "http://<datahub-frontend-url>"
      - name: AUTH_OIDC_JIT_PROVISIONING_ENABLED
        value: "true"
      - name: AUTH_OIDC_PRE_PROVISIONING_REQUIRED
        value: "false"
      - name: AUTH_OIDC_EXTRACT_GROUPS_ENABLED
        value: "true"
      - name: AUTH_OIDC_USER_NAME_CLAIM
        value: "preferred_username"
      - name: AUTH_OIDC_GROUPS_CLAIM
        value: "datahub-groups"
      - name: AUTH_JAAS_ENABLED
        value: "false"
      - name: AUTH_VERBOSE_LOGGING
        value: "true"
    
  3. Restart DataHub Frontend:

    • After configuring the environment variables, restart the datahub-frontend container to apply the changes.
    docker-compose -p datahub -f docker-compose.yml -f docker-compose.override.yml up datahub-frontend-react
    
  4. Verify the Setup:

    • Log out and log back in through SSO to confirm the connection succeeded.

For more detailed guidance, you can refer to the <https://datahubproject.io/docs/0.13.1/authentication/guides/sso/configure-oidc-react|OIDC Authentication Guide>.

Would you like more detailed guidance on any specific step?

Sources: