Hello All, I am trying to connect DataHub to an Azure SQL instance. It is complaining that it is not encrypted. I added encrypt: yes
trustServerCertificate: true to the YAML recipe and then I get the error "Failed to configure the source (mssql) and extra fields not permitted for these. Any ideas or has anyone see this issue?
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
To help you more effectively, could you please provide the following information?
- Are you using the UI or CLI for ingestion?
- Which DataHub version are you using? (e.g., 0.12.0)?
- What data source(s) are you integrating with DataHub? (e.g., BigQuery)?
Based on the error message you provided, it seems that the issue is related to the encryption settings required by the Azure SQL server. According to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/mssql|DataHub MSSQL documentation>, you need to enable encryption in your configuration. Here is an example of how you can modify your recipe to enable encryption using the pytds
library:
source:
type: mssql
config:
host_port: hosttest.database.windows.net
database: test
username: bbdd_test_r
include_views: true
include_tables: true
profiling:
enabled: true
profile_table_level_only: true
stateful_ingestion:
enabled: true
password: 'SbXh$%fkjssfdgd'
options:
connect_args:
cafile: /path/to/your/ca.pem
validate_host: true
If you are using ODBC, you can configure it as follows:
source:
type: mssql-odbc
config:
host_port: hosttest.database.windows.net
database: test
username: bbdd_test_r
password: 'SbXh$%fkjssfdgd'
use_odbc: "True"
uri_args:
driver: "ODBC Driver 17 for SQL Server"
Encrypt: "yes"
TrustServerCertificate: "Yes"
ssl: "True"
For more details, you can refer to the <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/mssql|DataHub MSSQL documentation>.
Would you like more detailed guidance on setting this up?
Sources:
- <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/mssql|DataHub MSSQL Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.
I am using the CLI. I’m using DataHub 0.13.3.3. I’m trying to access an AzureSQL database.
How can I see the ODBC driver installed?
<@U06TM7M40RK> I have modified my recipe now it is giving me error “Can’t open lib ‘ODBC Driver 17 for SQL Server’ : file not found (0) (SQLDriverConnect)”)". I have run pip install pyodbc and pip install acryl-datahub[mssql]. Why can it not find the driver?
The error message “Can’t open lib ‘ODBC Driver 17 for SQL Server’: file not found (0) (SQLDriverConnect)” indicates that the ODBC driver for SQL Server is not installed on your system. Installing the pyodbc
library via pip
is not sufficient; you also need to install the actual ODBC driver.
Here are the steps to install the ODBC driver on different operating systems:
For Ubuntu/Debian:
-
Install the Microsoft ODBC Driver for SQL Server:
sudo su curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list > /etc/apt/sources.list.d/mssql-release.list exit sudo apt-get update sudo ACCEPT_EULA=Y apt-get install -y msodbcsql17
-
Install the unixODBC development headers:
sudo apt-get install -y unixodbc-dev
For Red Hat/CentOS:
-
Install the Microsoft ODBC Driver for SQL Server:
sudo su curl https://packages.microsoft.com/keys/microsoft.asc | sudo tee /etc/yum.repos.d/microsoft.asc curl https://packages.microsoft.com/config/rhel/7/prod.repo | sudo tee /etc/yum.repos.d/msprod.repo exit sudo yum update sudo ACCEPT_EULA=Y yum install -y msodbcsql17
-
Install the unixODBC development headers:
sudo yum install -y unixODBC-devel
For macOS:
-
Install Homebrew if you haven’t already:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
-
Install the Microsoft ODBC Driver for SQL Server:
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release brew update HOMEBREW_NO_AUTO_UPDATE=1 brew install msodbcsql17
-
Install the unixODBC development headers:
brew install unixodbc
Verify Installation:
After installing the ODBC driver, you can verify it by running the following command:
odbcinst -j
This command will show you the configuration files and the installed drivers.
Update Your DataHub Recipe:
Once the ODBC driver is installed, you can use the following recipe to connect to your Azure SQL database:
source:
type: mssql-odbc
config:
host_port: 'hosttest.database.windows.net:1433'
database: 'test'
username: 'bbdd_test_r'
password: 'SbXh$%fkjssfdgd'
use_odbc: "True"
uri_args:
driver: "ODBC Driver 17 for SQL Server"
Encrypt: "yes"
TrustServerCertificate: "Yes"
ssl: "True"
Would you like more detailed steps on any of these points?
Sources:
- <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/mssql|DataHub MSSQL Documentation>
- <Create new page · mkleehammer/pyodbc Wiki · GitHub Installation Wiki> 0 button 0 button Hint: Mention @runllm in the thread for followups.