Troubleshooting Oracle Ingestion Error and Python Version Mismatch

Original Slack Thread

Hello everyone!! after a “nuke” command :disappointed: we aren’t able to put the instance back working . We have some problem ingesting from oracle

Execution finished with errors.
{'exec_id': '57365a9a-2b25-4df0-93cf-030561856286',
 'infos': ['2025-02-04 15:58:20.951200 INFO: Starting execution for task with name=RUN_INGEST',
           "2025-02-04 15:58:25.005034 INFO: Failed to execute 'datahub ingest', exit code 1",
           '2025-02-04 15:58:25.005188 INFO: Caught exception EXECUTING task_id=57365a9a-2b25-4df0-93cf-030561856286, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 139, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 402, in '
           'execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
 'errors': []}

~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv is already set up
venv setup time = 0 sec
This version of datahub supports report-to functionality
+ exec datahub ingest run -c /tmp/datahub/ingest/57365a9a-2b25-4df0-93cf-030561856286/recipe.yml --report-to /tmp/datahub/logs/57365a9a-2b25-4df0-93cf-030561856286/artifacts/ingestion_report.json
[2025-02-04 15:58:23,696] INFO     {datahub.cli.ingest_cli:150} - DataHub CLI version: 0.15.0.5rc8
[2025-02-04 15:58:23,698] INFO     {datahub.ingestion.run.pipeline:256} - No sink configured, attempting to use the default datahub-rest sink.
[2025-02-04 15:58:23,709] INFO     {datahub.ingestion.run.pipeline:273} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
[2025-02-04 15:58:24,021] ERROR    {datahub.entrypoints:235} - Command failed: Failed to find a registered source for type oracle: oracle is disabled due to a missing dependency: oracledb; try running `pip install 'acryl-datahub[oracle]'`
Traceback (most recent call last):
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 126, in _ensure_not_lazy
    plugin_class = import_path(path)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 56, in import_path
    item = importlib.import_module(module_name)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "&lt;frozen importlib._bootstrap&gt;", line 1050, in _gcd_import
  File "&lt;frozen importlib._bootstrap&gt;", line 1027, in _find_and_load
  File "&lt;frozen importlib._bootstrap&gt;", line 1006, in _find_and_load_unlocked
  File "&lt;frozen importlib._bootstrap&gt;", line 688, in _load_unlocked
  File "&lt;frozen importlib._bootstrap_external&gt;", line 883, in exec_module
  File "&lt;frozen importlib._bootstrap&gt;", line 241, in _call_with_frames_removed
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/source/sql/oracle.py", line 11, in &lt;module&gt;
    import oracledb
ModuleNotFoundError: No module named 'oracledb'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 138, in _add_init_error_context
    yield
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 289, in __init__
    source_class = source_registry.get(self.source_type)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 178, in get
    raise ConfigurationError(
datahub.configuration.common.ConfigurationError: oracle is disabled due to a missing dependency: oracledb; try running `pip install 'acryl-datahub[oracle]'`

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/entrypoints.py", line 222, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 476, in wrapper
    raise e
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 424, in wrapper
    res = func(*args, **kwargs)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 171, in run
    pipeline = Pipeline.create(
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 422, in create
    return cls(
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 286, in __init__
    with _add_init_error_context(
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 142, in _add_init_error_context
    raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type oracle: oracle is disabled due to a missing dependency: oracledb; try running `pip install 'acryl-datahub[oracle]'````

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

The error message you’re encountering indicates that the Oracle ingestion source is disabled due to a missing dependency, specifically the oracledb module. This is causing the ingestion process to fail. To resolve this issue, you need to install the necessary Oracle ingestion plugin for DataHub.

Here are the steps to resolve the issue:

  1. Install the Oracle Plugin:
    You need to install the acryl-datahub[oracle] package to enable the Oracle source. Run the following command in your environment where DataHub is set up:

    pip install 'acryl-datahub[oracle]'
    
  2. Verify Installation:
    After installation, verify that the plugin is active by running:

    datahub check plugins
    
  3. Re-run the Ingestion:
    Once the plugin is installed, try running the ingestion process again.

If you need more detailed instructions or encounter any issues during the installation, feel free to ask for further assistance.

Sources:

it’s not working; the plugin seems installed fine but i get the same error <@U06TM7M40RK>

<@U06TM7M40RK> it’s not working. The oracle plugin seems installed and ok but i get the same error when running the ingestion job

If the Oracle plugin is installed but you’re still encountering the same error, there might be a few other things to check and try:

  1. Verify Python Environment:
    Ensure that the pip install 'acryl-datahub[oracle]' command was executed in the same Python environment where your DataHub ingestion is running. Sometimes, the installation might happen in a different environment, leading to the module not being found during execution.

  2. Check for Virtual Environment Activation:
    If you are using a virtual environment, make sure it is activated before running the ingestion command. You can activate it using:

    source /path/to/your/venv/bin/activate
    
  3. Reinstall the Plugin:
    Try uninstalling and reinstalling the plugin to ensure there are no issues with the installation:

    pip uninstall acryl-datahub
    pip install 'acryl-datahub[oracle]'
    
  4. Check for Conflicting Packages:
    Ensure there are no conflicting packages in your environment. You can list installed packages using:

    pip list
    
  5. Inspect Logs for Additional Errors:
    Check the logs for any additional error messages that might provide more context. You can do this by running:

    datahub ingest -c /path/to/your/recipe.yml --debug
    
  6. Verify Plugin Status:
    Run the following command to check the status of the installed plugins:

    datahub check plugins --verbose
    

If the issue persists, please provide any additional error messages or logs that might help in diagnosing the problem further.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><

<@U06TM7M40RK> it seems that datahub is configured on python 3.10 but i’m on 3.8

    self.gen.throw(typ, value, traceback)```

It looks like there might be a mismatch between the Python version you are using and the one that DataHub is configured to use. Here are some steps to resolve this issue:

  1. Verify Python Version:
    Ensure that the Python version you are using is consistent across your environment. You can check the version by running:

    python3 --version
    
  2. Create a Virtual Environment:
    It’s a good practice to use a virtual environment to manage dependencies and Python versions. You can create a virtual environment with Python 3.10 using the following commands:

    python3.10 -m venv /path/to/your/venv
    source /path/to/your/venv/bin/activate
    
  3. Reinstall DataHub in the Virtual Environment:
    Once the virtual environment is activated, reinstall DataHub and the Oracle plugin:

    pip install --upgrade pip wheel setuptools
    pip install 'acryl-datahub[oracle]'
    
  4. Run the Ingestion Command:
    With the virtual environment activated, try running your ingestion command again.

  5. Check for Multiple Python Installations:
    Ensure that there are no conflicting Python installations on your system. You can list all Python installations using:

    ls /usr/bin/python*
    

If you continue to experience issues, please provide more details about your setup, such as the operating system and how Python is installed, so I can assist you further.

Would you like more detailed steps on any of these points?

Sources: