Error ingesting metadata from MSSQL Server with 'ODBC Driver 17 for SQL Server' - UTF-16-LE codec issue

Original Slack Thread

Hello Team, I am getting the error messages below ingesting metadata from MSSQL Server using “ODBC Driver 17 for SQL Server”

Error message: datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (mssql): ‘utf-16-le’ codec can’t decode byte 0x44 in position 16: truncated data

I am using CLI

Datahub version: 0.10.5

Anyone encountered the same issue? If so, I would appreciate if you share your experiences. Thanks!

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

Hey <@U054RJ6EQBD>, When configuring the MSSQL source in DataHub, consider specifying the character encoding explicitly in the connection string. I hope this will help.

<@U06DLBKN7HV> thanks for your response. Could you please help me identifying the config parameter from the document below? https://datahubproject.io/docs/generated/ingestion/sources/mssql/

Hey <@U054RJ6EQBD>, You can try adding encoding under #options , inside “source: config:” section. I will also recommend to use latest datahub version.

i.e.
source:
type: mssql
config:
# Coordinates
host_port: localhost:1433
database: DemoDatabase

_# Credentials_

username: admin
password: password

_# Options_

encoding: “utf-16-le”
use_odbc: “True”
uri_args:
driver: “ODBC Driver 17 for SQL Server”
Encrypt: “yes”
TrustServerCertificate: “Yes”
ssl: “True”