[2024-06-03 04:40:58,370] INFO {datahub.cli.ingest_cli:147} - DataHub CLI version: 0.13.1.2
[2024-06-03 04:40:58,486] INFO {datahub.ingestion.run.pipeline:239} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
[2024-06-03 04:41:01,383] ERROR {datahub.entrypoints:201} - Command failed: Failed to find a registered source for type redshift: redshift is disabled due to an error in initialization
Traceback (most recent call last):
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/api/registry.py", line 126, in _ensure_not_lazy
plugin_class = import_path(path)
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/api/registry.py", line 56, in import_path
item = importlib.import_module(module_name)
File "/usr/lib64/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/source/redshift/redshift.py", line 47, in <module>
from datahub.ingestion.source.redshift.lineage import RedshiftLineageExtractor
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/source/redshift/lineage.py", line 34, in <module>
from datahub.ingestion.source.redshift.report import RedshiftReport
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/source/redshift/report.py", line 6, in <module>
from datahub.ingestion.source.sql.sql_generic_profiler import ProfilingSqlReport
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_generic_profiler.py", line 13, in <module>
from datahub.ingestion.source.ge_data_profiler import (
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 30, in <module>
from great_expectations.core.profiler_types_mapping import ProfilerTypeMapping
File "/home/ec2-user/.local/lib/python3.9/site-packages/great_expectations/__init__.py", line 6, in <module>
from great_expectations.data_context.migrator.cloud_migrator import CloudMigrator
File "/home/ec2-user/.local/lib/python3.9/site-packages/great_expectations/data_context/__init__.py", line 1, in <module>
from great_expectations.data_context.data_context import (
File "/home/ec2-user/.local/lib/python3.9/site-packages/great_expectations/data_context/data_context/__init__.py", line 1, in <module>
from great_expectations.data_context.data_context.abstract_data_context import (
File "/home/ec2-user/.local/lib/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py", line 38, in <module>
from great_expectations.core import ExpectationSuite
File "/home/ec2-user/.local/lib/python3.9/site-packages/great_expectations/core/__init__.py", line 3, in <module>
from .domain import Domain
File "/home/ec2-user/.local/lib/python3.9/site-packages/great_expectations/core/domain.py", line 8, in <module>
from great_expectations.core.id_dict import IDDict
File "/home/ec2-user/.local/lib/python3.9/site-packages/great_expectations/core/id_dict.py", line 5, in <module>
from great_expectations.core.util import convert_to_json_serializable
File "/home/ec2-user/.local/lib/python3.9/site-packages/great_expectations/core/util.py", line 32, in <module>
from IPython import get_ipython
File "/home/ec2-user/.local/lib/python3.9/site-packages/IPython/__init__.py", line 53, in <module>
from .terminal.embed import embed
File "/home/ec2-user/.local/lib/python3.9/site-packages/IPython/terminal/embed.py", line 16, in <module>
from IPython.terminal.interactiveshell import TerminalInteractiveShell
File "/home/ec2-user/.local/lib/python3.9/site-packages/IPython/terminal/interactiveshell.py", line 48, in <module>
from .debugger import TerminalPdb, Pdb
File "/home/ec2-user/.local/lib/python3.9/site-packages/IPython/terminal/debugger.py", line 8, in <module>
from .shortcuts import create_ipython_shortcuts
File "/home/ec2-user/.local/lib/python3.9/site-packages/IPython/terminal/shortcuts/__init__.py", line 29, in <module>
from IPython.terminal.shortcuts.filters import filter_from_string
File "/home/ec2-user/.local/lib/python3.9/site-packages/IPython/terminal/shortcuts/filters.py", line 18, in <module>
from prompt_toolkit.filters import (
ImportError: cannot import name 'has_suggestion' from 'prompt_toolkit.filters' (/home/ec2-user/.local/lib/python3.9/site-packages/prompt_toolkit/filters/__init__.py)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 121, in _add_init_error_context
yield
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 249, in __init__
source_class = source_registry.get(source_type)
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/api/registry.py", line 180, in get
raise ConfigurationError(
datahub.configuration.common.ConfigurationError: redshift is disabled due to an error in initialization
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/entrypoints.py", line 188, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/home/ec2-user/.local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/ec2-user/.local/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/ec2-user/.local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ec2-user/.local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ec2-user/.local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ec2-user/.local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/telemetry/telemetry.py", line 454, in wrapper
raise e
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/telemetry/telemetry.py", line 403, in wrapper
res = func(*args, **kwargs)
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 201, in run
ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
File "/usr/lib64/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 170, in run_ingestion_and_check_upgrade
pipeline = Pipeline.create(
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 363, in create
return cls(
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 249, in __init__
source_class = source_registry.get(source_type)
File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/home/ec2-user/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 123, in _add_init_error_context
raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type redshift: redshift is disabled due to an error in initialization```
Hi All,
suddenly started getting this error while ingesting metadata into datahub.
Has anyone faced this issue before? I would be grateful if someone can help me debug this or provide a fix. Thanks!
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
To help you more effectively, could you please provide the following information:
- Are you using the UI or CLI for ingestion?
- Which DataHub version are you using? (e.g., 0.13.1.2)?
- What data source(s) are you integrating with DataHub (e.g., Redshift)?
Based on the error message you provided, it seems that the Redshift source is disabled due to an error in initialization. This could be related to a dependency issue or a misconfiguration.
Potential Solutions:
-
Dependency Issue: The error message indicates an issue with the
prompt_toolkit
library. You might want to check if all dependencies are correctly installed and compatible. Specifically, ensure that theprompt_toolkit
version is compatible with your DataHub version. -
Configuration Issue: Double-check your
recipe.dhub.yaml
configuration file to ensure that all required fields are correctly specified. -
Version Compatibility: Ensure that you are using compatible versions of DataHub CLI and the Redshift connector. Sometimes upgrading or downgrading to a specific version can resolve such issues.
Example Configuration:
Here is an example configuration for Redshift ingestion that you can compare with your recipe.dhub.yaml
:
pipeline_name: 'Test Ingestion of Redshift'
source:
type: redshift
config:
database: 'DB_NAME'
host_port: 'DB_HOST:PORT'
include_table_lineage: true
include_tables: true
include_views: false
password: '********'
profiling:
enabled: true
profile_table_level_only: true
schema_pattern:
allow:
- SOME_SCHEMA_NAME
stateful_ingestion:
enabled: true
table_lineage_mode: stl_scan_based
username: DummyUserName
transformers:
- type: set_dataset_browse_path
config:
path_templates:
- '/PLATFORM/Test Ingestion of Redshift/DATASET_PARTS'
sink:
type: datahub-rest
config:
server: '${DATAHUB_GMS_URL}'
token: '${DATAHUB_GMS_TOKEN}'
Disabling Stateful Ingestion:
If the issue persists, you can try disabling stateful ingestion as a temporary workaround:
stateful_ingestion:
enabled: false
References:
- <Slack Discussion on Redshift Ingestion Issues>
- <https://github.com/datahub-project/datahub/blob/master/docs/quick-ingestion-guides/redshift/configuration.md|Redshift Configuration Guide>
Would you like more detailed steps on any of these potential solutions?
Sources:
- <Slack Discussion on Redshift Ingestion Issues>
- <https://github.com/datahub-project/datahub/blob/master/docs/quick-ingestion-guides/redshift/configuration.md|Redshift Configuration Guide> 0 button 0 button Hint: Mention @runllm in the thread for followups.
- I am using CLI for ingestion.
- Datahub version - 0.13.1.2
- I am only ingesting redshift with datahub.
This was running from a long time, around 6-7 months and then suddenly broke down.
<@U06TM7M40RK>
Thank you for providing the additional details. Given that your setup was working for a long time and suddenly broke down, it is likely due to a change in dependencies or an update that caused the issue.
Steps to Troubleshoot and Resolve the Issue:
-
Check Dependency Versions:
The error message indicates an issue with theprompt_toolkit
library. Ensure that the version ofprompt_toolkit
is compatible with your DataHub version. You can try reinstalling or downgrading theprompt_toolkit
library.pip install prompt_toolkit==3.0.20
-
Verify Configuration:
Double-check yourrecipe.dhub.yaml
configuration file to ensure that all required fields are correctly specified. Here is an example configuration for Redshift ingestion:pipeline_name: 'Test Ingestion of Redshift' source: type: redshift config: database: 'DB_NAME' host_port: 'DB_HOST:PORT' include_table_lineage: true include_tables: true include_views: false password: '********' profiling: enabled: true profile_table_level_only: true schema_pattern: allow: - SOME_SCHEMA_NAME stateful_ingestion: enabled: true table_lineage_mode: stl_scan_based username: DummyUserName transformers: - type: set_dataset_browse_path config: path_templates: - '/PLATFORM/Test Ingestion of Redshift/DATASET_PARTS' sink: type: datahub-rest config: server: '${DATAHUB_GMS_URL}' token: '${DATAHUB_GMS_TOKEN}'
-
Disable Stateful Ingestion:
As a temporary workaround, you can try disabling stateful ingestion to see if it resolves the issue:stateful_ingestion: enabled: false
-
Update DataHub and Dependencies:
Ensure that both the DataHub server and CLI are updated to the latest versions. Sometimes, updating to the latest version can resolve compatibility issues.pip install --upgrade acryl-datahub
-
Check Logs for Detailed Errors:
Review the logs from the DataHub ingestion executor and the DataHub GMS to identify any specific error messages. This can provide insights into why the jobs are marked as failed. You can access the logs by clicking on the Details button of the failed ingestion run in the UI.
Example Command to Run Ingestion:
Here is an example command to run the ingestion with the updated configuration:
datahub ingest -c recipe.dhub.yaml
References:
- <Slack Discussion on Redshift Ingestion Issues>
- <https://github.com/datahub-project/datahub/blob/master/docs/quick-ingestion-guides/redshift/configuration.md|Redshift Configuration Guide>
Would you like more detailed steps on any of these potential solutions?
Sources:
- <Slack Discussion on Redshift Ingestion Issues>
- <https://github.com/datahub-project/datahub/blob/master/docs/quick-ingestion-guides/redshift/configuration.md|Redshift Configuration Guide> 0 button 0 button Hint: Mention @runllm in the thread for followups.
It looks like you’ve installed some other libraries which are causing issues. In general, we recommend installing acryl-datahub in a venv to avoid these sorts of issues
File "/home/ec2-user/.local/lib/python3.9/site-packages/IPython/terminal/shortcuts/filters.py", line 18, in <module>
from prompt_toolkit.filters import (
ImportError: cannot import name 'has_suggestion' from 'prompttoolkit.filters' (/home/ec2-user/.local/lib/python3.9/site-packages/prompttoolkit/filters/init.py)