Struggling with Secure Datahub Ingestion from SQL Server: Seeking Assistance with Configuration and Docker Image Build in Azure AKS

Original Slack Thread

Hi <@U07FV3UHE4D> - I am struggling to get Datahub ingest data from SQL Server with a secure connection. I have followed the instructions in this https://forum.datahubproject.io/t/seeking-help-with-datahub-ingestion-through-mssql-integration/1360/3|link but it doesn’t work. Can anyone help with a link that has detailed process of enabling secure SQL server connection?.. I have also tried installing the SQL Server drivers but I am unable to. My environment is deployed in Azure AKS.

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To enable a secure connection for DataHub ingestion from a SQL Server, you need to ensure that encryption is properly configured. Based on the information from the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/mssql|DataHub MSSQL documentation>, there are two main approaches to achieve this: using the pytds library or the pyodbc library.

  1. Using pytds Library:

    • You need to specify the CA file and enable host validation in your ingestion configuration. Here is an example configuration:
    source:
      type: mssql
      config:
        host_port: your_host:1433
        database: your_database
        username: your_username
        password: your_password
        include_views: true
        include_tables: true
        profiling:
          enabled: true
          profile_table_level_only: true
        stateful_ingestion:
          enabled: true
        options:
          connect_args:
            cafile: /path/to/your/ca.pem
            validate_host: true
    
  2. Using pyodbc Library:

    • This method is often recommended for environments like Azure SQL where encryption is mandatory. You need to ensure that the ODBC driver is installed and configure your ingestion recipe as follows:
    source:
      type: mssql-odbc
      config:
        host_port: your_host:1433
        database: your_database
        username: your_username
        password: your_password
        use_odbc: "True"
        uri_args:
          driver: "ODBC Driver 17 for SQL Server"
          Encrypt: "yes"
          TrustServerCertificate: "Yes"
          ssl: "True"
    

For installing the ODBC driver in an Azure AKS environment, you might need to create a custom Docker image that includes the necessary dependencies. Here is a basic Dockerfile example:

FROM linkedin/datahub-ingestion:latest

RUN apt-get update && apt-get install -y unixodbc-dev
RUN pip install pyodbc
RUN pip install 'acryl-datahub[mssql]'

Build the Docker image with:

docker build -t datahub-ingestion-mssql .

Then, update your Helm chart to use this custom image.

If you need more detailed guidance on any of these steps, please let me know!

Sources:

<@U06TM7M40RK> - I Have tried using Using pytds Library but I get an error message during ingestion that no arguments are allowed as Options.

<@U06TM7M40RK> - Can I please have detailed information on how to build a custom docker image in Azure AKS?..

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><

<@U07FV3UHE4D> - I Have tried using Using pytds Library but I get an error message during ingestion that no arguments are allowed as Options.

<@U07FV3UHE4D> - Can I please have detailed information on how to build a custom docker image in Azure AKS?..

Thank you for letting us know you need a person to take a look at this thread.

Either a community member or one of our team members will try to assist when they have a moment, responses unfortunately might not be immediate. We hope this helps speed things along :slight_smile: