Troubleshooting Classification Error on MSSQL Database in DataHub

Original Slack Thread

<@U06TM7M40RK> I am trying to run classification on a mssql db. I get the following error INFO sqlalchemy.engine.Engine SELECT TOP 120 *
FROM schema.[Table]
2024-12-04 13:30:24,207 INFO sqlalchemy.engine.Engine [generated in 0.00028s] {}
2024-12-04 13:30:24,208 INFO sqlalchemy.engine.Engine ROLLBACK
[2024-12-04 13:30:24,208] WARNING {datahub.ingestion.glossary.classification_mixin:119} - Failed to get sample values for dataset. Make sure you have granted SELECT permissions on dataset. db.schema.Table
[2024-12-04 13:30:24,208] DEBUG {datahub.ingestion.glossary.classification_mixin:123} - Error
Traceback (most recent call last):
File “/tmp/datahub/ingest/venv-mssql-8dd2d9fc4e9b7a66/lib/python3.10/site-packages/sqlalchemy/engine/base.py”, line 1910, in _execute_context
self.dialect.do_execute(
File “/tmp/datahub/ingest/venv-mssql-8dd2d9fc4e9b7a66/lib/python3.10/site-packages/sqlalchemy/engine/default.py”, line 736, in do_execute
cursor.execute(statement, parameters)
File “/tmp/datahub/ingest/venv-mssql-8dd2d9fc4e9b7a66/lib/python3.10/site-packages/pytds/cursor.py”, line 322, in execute
self._session.execute(operation, params)
File “/tmp/datahub/ingest/venv-mssql-8dd2d9fc4e9b7a66/lib/python3.10/site-packages/pytds/tds_session.py”, line 913, in execute
self.submit_plain_query(operation)
File “/tmp/datahub/ingest/venv-mssql-8dd2d9fc4e9b7a66/lib/python3.10/site-packages/pytds/tds_session.py”, line 958, in submit_plain_query
self.cancel_if_pending()
File “/tmp/datahub/ingest/venv-mssql-8dd2d9fc4e9b7a66/lib/python3.10/site-packages/pytds/tds_session.py”, line 732, in cancel_if_pending
self.process_cancel()
File “/tmp/datahub/ingest/venv-mssql-8dd2d9fc4e9b7a66/lib/python3.10/site-packages/pytds/tds_session.py”, line 288, in process_cancel
token_id = self.get_token_id()
File “/tmp/datahub/ingest/venv-mssql-8dd2d9fc4e9b7a66/lib/python3.10/site-packages/pytds/tds_session.py”, line 1589, in get_token_id
self.set_state(tds_base.TDS_READING)
File “/tmp/datahub/ingest/venv-mssql-8dd2d9fc4e9b7a66/lib/python3.10/site-packages/pytds/tds_session.py”, line 595, in set_state
raise tds_base.InterfaceError(
pytds.tds_base.InterfaceError: logic error: cannot change query state from DEAD to READING . I tested connectivity and that worked. Also ran the sql query inside a sql editor and it ran successfully so it seems the error is on Datahub side.

It seems likely that the issue you’re encountering is related to permissions or the configuration of your DataHub setup when trying to run classification on a MSSQL database. The error message indicates a failure to get sample values for the dataset, which suggests that there might be a permissions issue with the SELECT operation on the dataset.

Based on the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/mssql|DataHub documentation for MSSQL>, it is important to ensure that the user credentials used for the ingestion have the necessary SELECT permissions on the database objects you are trying to access. You might want to verify that the user has the appropriate permissions to execute SELECT queries on the specified tables.

Additionally, the error trace indicates an issue with the pytds library, which is one of the libraries used to connect to SQL Server. If you are using pytds, ensure that it is properly configured and that there are no connectivity issues. Alternatively, you can consider using pyodbc as the underlying library for connecting to SQL Server, as it might handle certain scenarios differently. If you choose to use pyodbc, make sure to change the source type from mssql to mssql-odbc in your configuration.

If you have verified the permissions and the issue persists, it might be helpful to check the DataHub logs for any additional error messages or warnings that could provide more context on the issue.

Would you like more detailed guidance on configuring the DataHub ingestion for MSSQL or troubleshooting the pytds library?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><