Troubleshooting Azure SQL Connection Encryption and ODBC Driver Installation

Original Slack Thread

Hello All, I am trying to connect DataHub to an Azure SQL instance. It is complaining that it is not encrypted. I added encrypt: yes
trustServerCertificate: true to the YAML recipe and then I get the error "Failed to configure the source (mssql) and extra fields not permitted for these. Any ideas or has anyone see this issue?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To help you more effectively, could you please provide the following information?

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g., 0.12.0)?
  3. What data source(s) are you integrating with DataHub? (e.g., BigQuery)?

Based on the error message you provided, it seems that the issue is related to the encryption settings required by the Azure SQL server. According to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/mssql|DataHub MSSQL documentation>, you need to enable encryption in your configuration. Here is an example of how you can modify your recipe to enable encryption using the pytds library:

source:
    type: mssql
    config:
        host_port: hosttest.database.windows.net
        database: test
        username: bbdd_test_r
        include_views: true
        include_tables: true
        profiling:
            enabled: true
            profile_table_level_only: true
        stateful_ingestion:
            enabled: true
        password: 'SbXh$%fkjssfdgd'
        options:
            connect_args:
                cafile: /path/to/your/ca.pem
                validate_host: true

If you are using ODBC, you can configure it as follows:

source:
    type: mssql-odbc
    config:
        host_port: hosttest.database.windows.net
        database: test
        username: bbdd_test_r
        password: 'SbXh$%fkjssfdgd'
        use_odbc: "True"
        uri_args:
            driver: "ODBC Driver 17 for SQL Server"
            Encrypt: "yes"
            TrustServerCertificate: "Yes"
            ssl: "True"

For more details, you can refer to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/mssql|DataHub MSSQL documentation>.

Would you like more detailed guidance on setting this up?

Sources:

I am using the CLI. I’m using DataHub 0.13.3.3. I’m trying to access an AzureSQL database.

How can I see the ODBC driver installed?

<@U06TM7M40RK> I have modified my recipe now it is giving me error “Can’t open lib ‘ODBC Driver 17 for SQL Server’ : file not found (0) (SQLDriverConnect)”)". I have run pip install pyodbc and pip install acryl-datahub[mssql]. Why can it not find the driver?

The error message “Can’t open lib ‘ODBC Driver 17 for SQL Server’: file not found (0) (SQLDriverConnect)” indicates that the ODBC driver for SQL Server is not installed on your system. Installing the pyodbc library via pip is not sufficient; you also need to install the actual ODBC driver.

Here are the steps to install the ODBC driver on different operating systems:

For Ubuntu/Debian:

  1. Install the Microsoft ODBC Driver for SQL Server:

    sudo su
    curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
    curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list > /etc/apt/sources.list.d/mssql-release.list
    exit
    sudo apt-get update
    sudo ACCEPT_EULA=Y apt-get install -y msodbcsql17
    
  2. Install the unixODBC development headers:

    sudo apt-get install -y unixodbc-dev
    

For Red Hat/CentOS:

  1. Install the Microsoft ODBC Driver for SQL Server:

    sudo su
    curl https://packages.microsoft.com/keys/microsoft.asc | sudo tee /etc/yum.repos.d/microsoft.asc
    curl https://packages.microsoft.com/config/rhel/7/prod.repo | sudo tee /etc/yum.repos.d/msprod.repo
    exit
    sudo yum update
    sudo ACCEPT_EULA=Y yum install -y msodbcsql17
    
  2. Install the unixODBC development headers:

    sudo yum install -y unixODBC-devel
    

For macOS:

  1. Install Homebrew if you haven’t already:

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    
  2. Install the Microsoft ODBC Driver for SQL Server:

    brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
    brew update
    HOMEBREW_NO_AUTO_UPDATE=1 brew install msodbcsql17
    
  3. Install the unixODBC development headers:

    brew install unixodbc
    

Verify Installation:
After installing the ODBC driver, you can verify it by running the following command:

odbcinst -j

This command will show you the configuration files and the installed drivers.

Update Your DataHub Recipe:
Once the ODBC driver is installed, you can use the following recipe to connect to your Azure SQL database:

source:
    type: mssql-odbc
    config:
        host_port: 'hosttest.database.windows.net:1433'
        database: 'test'
        username: 'bbdd_test_r'
        password: 'SbXh$%fkjssfdgd'
        use_odbc: "True"
        uri_args:
            driver: "ODBC Driver 17 for SQL Server"
            Encrypt: "yes"
            TrustServerCertificate: "Yes"
            ssl: "True"

Would you like more detailed steps on any of these points?

Sources: