Troubleshooting ODBC Driver for SQL Server in DataHub Ingestion on Windows 11

Original Slack Thread

Hi Team,
im trying to ingest SQL server data to datahub local instance. Im using Azure SQL server. i tried to connect with mssql and mssql-odbc. both of them doesn’t work. since the documentation says, mssql-odbc should be used for managed sql server instances, i followed the steps and installed the pre requisites pyodbc(can see in pip list output) and Microsoft ODBC Driver 17 for sql server(can see in control panel installed programs). I still get error saying driver not found. see the error below.

(Background on this error at: <https://sqlalche.me/e/14/dbapi>)```
what am i missing?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: _Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To address the issue you’re encountering with the ODBC driver for SQL Server in your DataHub ingestion setup, it seems that the driver is not being found by the system. Here are some steps to troubleshoot and resolve the problem:

  1. Verify ODBC Driver Installation:
    Ensure that the ODBC Driver 17 for SQL Server is correctly installed on your system. You can check the installation by running the following command in your terminal:

    odbcinst -j
    

    This command will show the configuration files used by unixODBC. Make sure the driver is listed in the odbcinst.ini file.

  2. Update Dockerfile:
    If the driver is not installed in your Docker container, you need to update your Dockerfile to include the installation of the ODBC driver. Here is an example Dockerfile that installs the ODBC driver:

    FROM python:3.9-slim
    
    # Install unixODBC and the ODBC Driver for SQL Server
    RUN apt-get update && \
        apt-get install -y curl apt-transport-https gnupg && \
        curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
        curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
        apt-get update && \
        ACCEPT_EULA=Y apt-get install -y msodbcsql17 unixodbc-dev && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*
    
    # Install DataHub
    RUN pip install acryl-datahub
    
    # Set environment variables
    ENV ODBCINI=/etc/odbc.ini
    ENV ODBCSYSINI=/etc
    
    # Copy your DataHub recipe
    COPY recipe.yml /app/recipe.yml
    
    # Run DataHub ingestion
    CMD ["datahub", "ingest", "-c", "/app/recipe.yml"]
    
  3. Update Ingestion Recipe:
    Ensure your ingestion recipe is correctly configured. Here is an example configuration for the mssql-odbc source type:

    source:
      type: mssql-odbc
      config:
        host_port: 'your_host:1433'
        database: 'your_database'
        username: 'your_username'
        password: 'your_password'
        use_odbc: 'True'
        uri_args:
          driver: 'ODBC Driver 17 for SQL Server'
          Encrypt: 'yes'
          TrustServerCertificate: 'Yes'
          ssl: 'True'
    
  4. Run Docker Container:
    Build and run your Docker container with the updated Dockerfile:

    docker build -t datahub-ingestion-mssql .
    docker run --rm datahub-ingestion-mssql
    

Would you like more detailed guidance on any of these steps?

Sources:

<@U06TM7M40RK> im on a windows 11 machine using docker desktop

Given that you are using Docker Desktop on a Windows 11 machine, the issue likely stems from the Docker container not having the necessary ODBC driver installed. Here are the steps to resolve this:

  1. Create a Custom Docker Image:
    You need to create a custom Docker image that includes the Microsoft ODBC Driver for SQL Server. Below is an example Dockerfile that you can use:

    FROM python:3.9-slim
    
    # Install unixODBC and the ODBC Driver for SQL Server
    RUN apt-get update && \
        apt-get install -y curl apt-transport-https gnupg && \
        curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
        curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
        apt-get update && \
        ACCEPT_EULA=Y apt-get install -y msodbcsql17 unixodbc-dev && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*
    
    # Install DataHub
    RUN pip install acryl-datahub
    
    # Set environment variables
    ENV ODBCINI=/etc/odbc.ini
    ENV ODBCSYSINI=/etc
    
    # Copy your DataHub recipe
    COPY recipe.yml /app/recipe.yml
    
    # Run DataHub ingestion
    CMD ["datahub", "ingest", "-c", "/app/recipe.yml"]
    
  2. Build the Docker Image:
    Save the above Dockerfile in a directory and build the Docker image using the following command:

    docker build -t datahub-ingestion-mssql .
    
  3. Run the Docker Container:
    Run the Docker container with the newly built image:

    docker run --rm datahub-ingestion-mssql
    
  4. Update Ingestion Recipe:
    Ensure your ingestion recipe is correctly configured. Here is an example configuration for the mssql-odbc source type:

    source:
      type: mssql-odbc
      config:
        host_port: 'your_host:1433'
        database: 'your_database'
        username: 'your_username'
        password: 'your_password'
        use_odbc: 'True'
        uri_args:
          driver: 'ODBC Driver 17 for SQL Server'
          Encrypt: 'yes'
          TrustServerCertificate: 'Yes'
          ssl: 'True'
    

Would you like more detailed guidance on any of these steps?

Sources:

<@U06TM7M40RK> i used another image and installed the driver, still same error

If you have already installed the ODBC driver in your Docker container and are still encountering the same error, it might be due to the driver not being correctly registered or the environment not being properly configured. Here are some additional steps to troubleshoot and resolve the issue:

  1. Verify Driver Installation in Docker Container:
    Ensure that the ODBC driver is correctly installed and registered in the Docker container. You can do this by running the following commands inside the container:

    docker exec -it <container_id> bash
    odbcinst -j
    odbcinst -q -d -n "ODBC Driver 17 for SQL Server"
    

    These commands will show the configuration files used by unixODBC and list the installed ODBC drivers. Make sure the driver is listed.

  2. Check odbcinst.ini and odbc.ini Files:
    Ensure that the odbcinst.ini and odbc.ini files are correctly configured. Here is an example configuration:

    /etc/odbcinst.ini:

    [ODBC Driver 17 for SQL Server]
    Description=Microsoft ODBC Driver 17 for SQL Server
    Driver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.6.so.1.1
    

    /etc/odbc.ini:

    [ODBC Data Sources]
    MSSQL=ODBC Driver 17 for SQL Server
    
    [MSSQL]
    Driver=ODBC Driver 17 for SQL Server
    Server=your_host
    Database=your_database
    
  3. Update Dockerfile:
    Ensure your Dockerfile includes the necessary steps to configure the ODBC driver. Here is an updated example:

    FROM python:3.9-slim
    
    # Install unixODBC and the ODBC Driver for SQL Server
    RUN apt-get update && \
        apt-get install -y curl apt-transport-https gnupg && \
        curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
        curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
        apt-get update && \
        ACCEPT_EULA=Y apt-get install -y msodbcsql17 unixodbc-dev && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*
    
    # Configure ODBC
    RUN echo "[ODBC Driver 17 for SQL Server]\nDescription=Microsoft ODBC Driver 17 for SQL Server\nDriver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.6.so.1.1" > /etc/odbcinst.ini
    RUN echo "[ODBC Data Sources]\nMSSQL=ODBC Driver 17 for SQL Server\n\n[MSSQL]\nDriver=ODBC Driver 17 for SQL Server\nServer=your_host\nDatabase=your_database" > /etc/odbc.ini
    
    # Install DataHub
    RUN pip install acryl-datahub
    
    # Set environment variables
    ENV ODBCINI=/etc/odbc.ini
    ENV ODBCSYSINI=/etc
    
    # Copy your DataHub recipe
    COPY recipe.yml /app/recipe.yml
    
    # Run DataHub ingestion
    CMD ["datahub", "ingest", "-c", "/app/recipe.yml"]
    
  4. Run the Docker Container:
    Build and run your Docker container with the updated Dockerfile:

    docker build -t datahub-ingestion-mssql .
    docker run --rm datahub-ingestion-mssql
    

Would you like more detailed guidance on any of these steps?

Sources:

  • <Slack Discussion on MSSQL Ingestion> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> above docker image build fails saying app/recipe.yaml not found

To resolve the issue of the Docker build failing due to the app/recipe.yml file not being found, you need to ensure that the recipe.yml file is present in the same directory as your Dockerfile. Here are the steps to follow:

  1. Ensure recipe.yml is in the Same Directory:
    Make sure that the recipe.yml file is in the same directory as your Dockerfile. The directory structure should look like this:

    /path/to/your/project
    ├── Dockerfile
    └── recipe.yml
    
  2. Update Dockerfile:
    Ensure your Dockerfile is correctly referencing the recipe.yml file. Here is the updated Dockerfile:

    FROM python:3.9-slim
    
    # Install unixODBC and the ODBC Driver for SQL Server
    RUN apt-get update && \
        apt-get install -y curl apt-transport-https gnupg && \
        curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
        curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
        apt-get update && \
        ACCEPT_EULA=Y apt-get install -y msodbcsql17 unixodbc-dev && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*
    
    # Configure ODBC
    RUN echo "[ODBC Driver 17 for SQL Server]\nDescription=Microsoft ODBC Driver 17 for SQL Server\nDriver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.6.so.1.1" > /etc/odbcinst.ini
    RUN echo "[ODBC Data Sources]\nMSSQL=ODBC Driver 17 for SQL Server\n\n[MSSQL]\nDriver=ODBC Driver 17 for SQL Server\nServer=your_host\nDatabase=your_database" > /etc/odbc.ini
    
    # Install DataHub
    RUN pip install acryl-datahub
    
    # Set environment variables
    ENV ODBCINI=/etc/odbc.ini
    ENV ODBCSYSINI=/etc
    
    # Copy your DataHub recipe
    COPY recipe.yml /app/recipe.yml
    
    # Run DataHub ingestion
    CMD ["datahub", "ingest", "-c", "/app/recipe.yml"]
    
  3. Build the Docker Image:
    Navigate to the directory containing your Dockerfile and recipe.yml file, then build the Docker image using the following command:

    docker build -t datahub-ingestion-mssql .
    
  4. Run the Docker Container:
    Run the Docker container with the newly built image:

    docker run --rm datahub-ingestion-mssql
    

Would you like more detailed guidance on any of these steps?

Sources:

<@U06TM7M40RK> still unable to find recipe.yml after keeping the above structure

If the Docker build process is still unable to find the recipe.yml file despite having the correct directory structure, it might be due to the context in which the Docker build command is being run. Here are some steps to ensure the file is correctly included in the Docker build context:

  1. Verify Directory Structure:
    Ensure your directory structure looks like this:

    /path/to/your/project
    ├── Dockerfile
    └── recipe.yml
    
  2. Run Docker Build Command from the Correct Directory:
    Navigate to the directory containing your Dockerfile and recipe.yml file before running the Docker build command:

    cd /path/to/your/project
    docker build -t datahub-ingestion-mssql .
    
  3. Check Docker Ignore File:
    Ensure there is no .dockerignore file that might be excluding the recipe.yml file from the build context. If there is a .dockerignore file, make sure it does not contain an entry that excludes recipe.yml.

  4. Verify Dockerfile and Build Context:
    Ensure the COPY command in your Dockerfile is correctly referencing the recipe.yml file. Here is the Dockerfile again for reference:

    FROM python:3.9-slim
    
    # Install unixODBC and the ODBC Driver for SQL Server
    RUN apt-get update && \
        apt-get install -y curl apt-transport-https gnupg && \
        curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
        curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
        apt-get update && \
        ACCEPT_EULA=Y apt-get install -y msodbcsql17 unixodbc-dev && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*
    
    # Configure ODBC
    RUN echo "[ODBC Driver 17 for SQL Server]\nDescription=Microsoft ODBC Driver 17 for SQL Server\nDriver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.6.so.1.1" > /etc/odbcinst.ini
    RUN echo "[ODBC Data Sources]\nMSSQL=ODBC Driver 17 for SQL Server\n\n[MSSQL]\nDriver=ODBC Driver 17 for SQL Server\nServer=your_host\nDatabase=your_database" > /etc/odbc.ini
    
    # Install DataHub
    RUN pip install acryl-datahub
    
    # Set environment variables
    ENV ODBCINI=/etc/odbc.ini
    ENV ODBCSYSINI=/etc
    
    # Copy your DataHub recipe
    COPY recipe.yml /app/recipe.yml
    
    # Run DataHub ingestion
    CMD ["datahub", "ingest", "-c", "/app/recipe.yml"]
    
  5. Debugging the Build Context:
    To debug the build context, you can create a simple Dockerfile that lists the contents of the build context:

    FROM alpine:latest
    COPY . /app
    RUN ls -la /app
    

    Build and run this Dockerfile to see if recipe.yml is being copied correctly:

    docker build -t debug-context .
    docker run --rm debug-context
    

If the recipe.yml file is still not being found, please provide the exact error message and any additional context that might help diagnose the issue.

Would you like more detailed guidance on any of these steps?

Sources:

docker build is fine but docker run fails,can anyone help with this?