Seeking Help with DataHub Ingestion through MSSQL Integration

Original Slack Thread

is anyone available for help?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: _Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

It sounds like you’re looking for assistance with a specific question related to DataHub. To get the most effective help, please provide the following information:

  1. Which DataHub version are you using? (e.g., 0.12.0)
  2. Are you using the UI or CLI for ingestion?
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery)
  4. Any relevant error logs or specific issues you’re encountering.

For human assistance, the DataHub Slack community is a great place to get help from other users and the core team. You can join the DataHub Slack community by signing up https://datahubproject.io/slack?utm_source=docs&utm_medium=page_link&utm_campaign=docs_page_link|here.

Once you’re in the Slack community, you can navigate to the appropriate channels based on your needs:

  • troubleshoot: For general debugging help.
  • getting-started: If you’re new and need guidance on where to begin.
  • #office-hours: For live support from the Core DataHub Team during their scheduled office hours.
  • #feature-requests: If you have suggestions on how to improve DataHub.
  • #contribute: If you’re looking to contribute to the DataHub project.

For more details on navigating the DataHub Slack, you can refer to the <https://github.com/datahub-project/datahub/blob/master/docs/slack.md|Slack guidelines>.

Would you like more detailed information on any specific aspect of the Slack community or how to get help there?

Sources:

UI
version 0.14.0.4
MSSQL

<@U06TM7M40RK> UI
version 0.14.0.4
MSSQL

Thank you for providing the details. Since you are using the UI for ingestion with DataHub version 0.14.0.4 and integrating with MSSQL, let’s address your issue.

Common Issues and Solutions for MSSQL Ingestion

  1. Encryption Settings:
    If your MSSQL server requires encryption, you need to ensure that your ingestion configuration includes the necessary encryption settings. According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/mssql|DataHub MSSQL documentation>, you can use either the pytds library or ODBC for this purpose.

    Using pytds Library:

    source:
      type: mssql
      config:
        host_port: your_host:1433
        database: your_database
        username: your_username
        password: your_password
        include_views: true
        include_tables: true
        profiling:
          enabled: true
          profile_table_level_only: true
        stateful_ingestion:
          enabled: true
        options:
          connect_args:
            cafile: /path/to/your/ca.pem
            validate_host: true
    

    Using ODBC:

    source:
      type: mssql-odbc
      config:
        host_port: your_host:1433
        database: your_database
        username: your_username
        password: your_password
        use_odbc: "True"
        uri_args:
          driver: "ODBC Driver 17 for SQL Server"
          Encrypt: "yes"
          TrustServerCertificate: "Yes"
          ssl: "True"
    
  2. Logs and Debugging:
    If you encounter errors during ingestion, you can check the logs for more details. According to <Slack Slack conversation>, you can view the ingestion logs by clicking into the ingestion run on the UI and then clicking the “more info” button on the far left of each run.

  3. Certificate Management:
    If you are using the pytds library and specifying a CA certificate file (cafile), ensure that the certificate is accessible from within the DataHub pod running in your EKS cluster. You can create a Kubernetes secret to store the certificate and mount it into the DataHub pod.

    Steps to Copy Certificates to DataHub Pod in EKS:

    kubectl create secret generic datahub-certs --from-file=ca.pem=/path/to/your/ca.pem
    

    Modify the DataHub Helm Values:

    datahub:
      gms:
        extraEnvs:
          - name: DATAHUB_GMS_USE_SSL
            value: "true"
        volumes:
          - name: certs
            secret:
              secretName: datahub-certs
        volumeMounts:
          - name: certs
            mountPath: /certs
      frontend:
        extraEnvs:
          - name: DATAHUB_FRONTEND_USE_SSL
            value: "true"
        volumes:
          - name: certs
            secret:
              secretName: datahub-certs
        volumeMounts:
          - name: certs
            mountPath: /certs
    

    Update the Ingestion Recipe:

   source:
     type: mssql
     config:
       host_port: your_host:1433
       database: your_database
       username: your_username
       password: your_password
       include_views: true
       include_tables: true
       profiling:
         enabled: true
         profile_table_level_only: true
       stateful_ingestion:
         enabled: true
       options:
         connect_args:
           cafile: /certs/ca.pem
           validate_host: true

Apply the Helm Chart:

helm upgrade --install datahub datahub/datahub --values values.yaml

Would you like more detailed guidance on any of these steps?

Sources: