Troubleshooting Datahub Integration with Great Expectations on Python 3.10

Original Slack Thread

Hi Team, how can Datahub be integrated with Great Expectations on Python 3.10? In case I follow the instructions here (https://datahubproject.io/docs/metadata-ingestion/integration_docs/great-expectations/), installed the acryl-datahub[great-expectations]library, this error appears:

2023-09-19 11:33:19   File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2023-09-19 11:33:19     return _run_code(code, main_globals, None,
2023-09-19 11:33:19   File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
2023-09-19 11:33:19     exec(code, run_globals)
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/gunicorn/__main__.py", line 7, in <module>
2023-09-19 11:33:19     run()
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
2023-09-19 11:33:19     WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/gunicorn/app/base.py", line 231, in run
2023-09-19 11:33:19     super().run()
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/gunicorn/app/base.py", line 72, in run
2023-09-19 11:33:19     Arbiter(self).run()
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 198, in run
2023-09-19 11:33:19     self.start()
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 138, in start
2023-09-19 11:33:19     self.cfg.on_starting(self)
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/airflow/www/gunicorn_config.py", line 40, in on_starting
2023-09-19 11:33:19     ProvidersManager().connection_form_widgets
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers_manager.py", line 978, in connection_form_widgets
2023-09-19 11:33:19     self.initialize_providers_hooks()
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers_manager.py", line 339, in wrapped_function
2023-09-19 11:33:19     func(*args, **kwargs)
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers_manager.py", line 421, in initialize_providers_hooks
2023-09-19 11:33:19     self.initialize_providers_list()
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers_manager.py", line 339, in wrapped_function
2023-09-19 11:33:19     func(*args, **kwargs)
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers_manager.py", line 398, in initialize_providers_list
2023-09-19 11:33:19     self._discover_all_providers_from_packages()
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers_manager.py", line 472, in _discover_all_providers_from_packages
2023-09-19 11:33:19     provider_info = entry_point.load()()
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/importlib_metadata/__init__.py", line 209, in load
2023-09-19 11:33:19     module = import_module(match.group('module'))
2023-09-19 11:33:19   File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
2023-09-19 11:33:19     return _bootstrap._gcd_import(name[level:], package, level)
2023-09-19 11:33:19   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
2023-09-19 11:33:19   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
2023-09-19 11:33:19   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
2023-09-19 11:33:19   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
2023-09-19 11:33:19   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
2023-09-19 11:33:19   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
2023-09-19 11:33:19   File "/home/airflow/.local/lib/python3.10/site-packages/datahub_provider/__init__.py", line 1, in <module>
2023-09-19 11:33:19     from datahub_airflow_plugin import get_provider_info
2023-09-19 11:33:19 ModuleNotFoundError: No module named 'datahub_airflow_plugin'```
So, what library do I need to install to integrate Datahub with Great Expectations on Python 3.10?

Not really sure what’s happening here - <@U04NUK1721W> could you help me with this?

I want to ingest metadata validation from Great Expectations to Datahub. I use docker as my deployment. When using Python 3.7, the acryl-datahub[great-expectations] library worked fine. When I upgraded to Python 3.10, the requirements.txt file was the same but that error (that I pointed out above) appeared

<@U04N9PYJBEW> can you help me with this?

<@U01GZEETMEZ> do you have any ideas here? Do we need to add acryl-datahub-airflow-plugin as a dependency to our great expectations plugin?

<@U01GZEETMEZ> could you help me?

This is likely a regression that will be fixed by this PR https://github.com/datahub-project/datahub/pull/8861. It’s happening because you have both Airflow and DataHub installed in the same venv

In the interim, a workaround should be to use acryl-datahub 0.10.5.4

This should be fixed in 0.11.0.1

Thank you so much!:thank_you: