Issue with Ingesting SQL Server in Datahub: Pyodbc Installation Error

Original Slack Thread

<@U02G4B6ADL6> & @All, I tried to ingest SQL Server in datahub, I am experiencing the following issue

I have installed pyodbc on AKS Pod, still i am getting the below issue. Did anyone have an idea what is this error about?

Execution finished with errors.
{'exec_id': '773bd2c0-2052-4544-bf4a-04b62c746952',
 'infos': ['2023-09-25 16:14:06.955451 INFO: Starting execution for task with name=RUN_INGEST',
           "2023-09-25 16:14:08.975548 INFO: Failed to execute 'datahub ingest'",
           '2023-09-25 16:14:08.975713 INFO: Caught exception EXECUTING task_id=773bd2c0-2052-4544-bf4a-04b62c746952, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
 'errors': []}

~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv setup time = 0
This version of datahub supports report-to functionality
datahub  ingest run -c /tmp/datahub/ingest/773bd2c0-2052-4544-bf4a-04b62c746952/recipe.yml --report-to /tmp/datahub/ingest/773bd2c0-2052-4544-bf4a-04b62c746952/ingestion_report.json
[2023-09-25 16:14:08,389] INFO     {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.4.2
[2023-09-25 16:14:08,441] INFO     {datahub.ingestion.run.pipeline:213} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-datahub-gms:8080>
[2023-09-25 16:14:08,813] ERROR    {datahub.entrypoints:199} - Command failed: Failed to configure the source (mssql): libodbc.so.2: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 120, in _add_init_error_context
    yield
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 226, in __init__
    self.source = source_class.create(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 195, in create
    return cls(config, ctx)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 125, in __init__
    for inspector in self.get_inspectors():
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 231, in get_inspectors
    url = self.config.get_sql_alchemy_url()
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 77, in get_sql_alchemy_url
    import pyodbc  # noqa: F401
ImportError: libodbc.so.2: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 186, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 448, in wrapper
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 397, in wrapper
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
    return func(ctx, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 187, in run
    pipeline = Pipeline.create(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 336, in create
    return cls(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 225, in __init__
    with _add_init_error_context(f"configure the source ({source_type})"):
  File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 122, in _add_init_error_context
    raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (mssql): libodbc.so.2: cannot open shared object file: No such file or directory```

Not really sure what’s happening here - <@U01GZEETMEZ> might be able to speak to this!

We’re triaging a regression in our docker image build for the datahub-actions container. In the interim, could you try pinning to datahub-actions v0.0.13

If that doesn’t fix it, then we can look into this specific issue in more depth

<@U01GZEETMEZ>, thanks for your reply. I am not understanding of “could you try pinning to datahub-actions v0.0.13”?

<@U01GZEETMEZ>, We are on that version v0.0.13 for data hub actions

Ah I see - could you try changing the source type to mssql-odbc?

<@U01GZEETMEZ>, still no luck with this option. I am still getting different error

Execution finished with errors.
{'exec_id': '98548553-0c03-4afe-af1f-794ffe0f5a70',
 'infos': ['2023-10-03 18:40:40.830911 INFO: Starting execution for task with name=RUN_INGEST',
           "2023-10-03 18:40:44.872069 INFO: Failed to execute 'datahub ingest'",
           '2023-10-03 18:40:44.872272 INFO: Caught exception EXECUTING task_id=98548553-0c03-4afe-af1f-794ffe0f5a70, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
 'errors': []}

~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv setup time = 0
This version of datahub supports report-to functionality
datahub  ingest run -c /tmp/datahub/ingest/98548553-0c03-4afe-af1f-794ffe0f5a70/recipe.yml --report-to /tmp/datahub/ingest/98548553-0c03-4afe-af1f-794ffe0f5a70/ingestion_report.json
[2023-10-03 18:40:42,977] INFO     {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.4.2
[2023-10-03 18:40:43,033] INFO     {datahub.ingestion.run.pipeline:213} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-datahub-gms:8080>
[2023-10-03 18:40:43,945] ERROR    {datahub.entrypoints:199} - Command failed: Failed to configure the source (mssql-odbc): libodbc.so.2: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 120, in _add_init_error_context
    yield
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 226, in __init__
    self.source = source_class.create(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 195, in create
    return cls(config, ctx)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 125, in __init__
    for inspector in self.get_inspectors():
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 231, in get_inspectors
    url = self.config.get_sql_alchemy_url()
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/sql/mssql.py", line 77, in get_sql_alchemy_url
    import pyodbc  # noqa: F401
ImportError: libodbc.so.2: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 186, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 448, in wrapper
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 397, in wrapper
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
    return func(ctx, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 187, in run
    pipeline = Pipeline.create(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 336, in create
    return cls(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 225, in __init__
    with _add_init_error_context(f"configure the source ({source_type})"):
  File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 122, in _add_init_error_context
    raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (mssql-odbc): libodbc.so.2: cannot open shared object file: No such file or directory```

Does your recipe have use_odbc set to true? If so, can you try setting it to false? I believe odbc might not be supported within UI ingestion

Just faced this issue myself. The problem was that using pyodbc on linux requires you to pre-install unixodbc first. You can refer to the <Create new page · mkleehammer/pyodbc Wiki · GitHub wiki> for steps on how to do this.