Troubleshooting Develop Mode CLI Issues in DataHub Metadata Ingestion Framework

Original Slack Thread

I’ve been trying to follow this https://datahubproject.io/docs/metadata-ingestion/developing/|documentation for developing the metadata ingestion framework and I’m having issues running the CLI in develop mode with the latest code in the datahub repo.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!
To honour the JVM settings for this build a single-use Daemon process will be forked. See <https://docs.gradle.org/8.0.2/userguide/gradle_daemon.html#sec:disabling_the_daemon>.
Daemon will be stopped at the end of the build 
Configuration on demand is an incubating feature.

&gt; Task :li-utils:mainDestroyStaleFiles SKIPPED
:li-utils:mainDestroyStaleFiles task is a NO-OP task.

&gt; Task :li-utils:mainCopyPdscSchemas SKIPPED
:li-utils:mainCopyPdscSchemas task is a NO-OP task.

&gt; Task :metadata-events:mxe-schemas:mainDestroyStaleFiles SKIPPED
:metadata-events:mxe-schemas:mainDestroyStaleFiles task is a NO-OP task.

&gt; Task :metadata-events:mxe-schemas:mainCopyPdscSchemas SKIPPED
:metadata-events:mxe-schemas:mainCopyPdscSchemas task is a NO-OP task.

&gt; Task :metadata-events:mxe-schemas:changedFilesReport
Checking idl and snapshot files for changes...

&gt; Task :metadata-events:mxe-schemas:testDestroyStaleFiles SKIPPED
:metadata-events:mxe-schemas:testDestroyStaleFiles task is a NO-OP task.

&gt; Task :metadata-events:mxe-schemas:testCopyPdscSchemas SKIPPED
:metadata-events:mxe-schemas:testCopyPdscSchemas task is a NO-OP task.

BUILD SUCCESSFUL in 16s
33 actionable tasks: 7 executed, 2 from cache, 24 up-to-date
richie.chen@MACC02CV6EGMD6R metadata-ingestion % source venv/bin/activate
(venv) richie.chen@MACC02CV6EGMD6R metadata-ingestion % datahub version
DataHub CLI version: unavailable (installed in develop mode)
Python version: 3.10.8 (v3.10.8:aaaf517424, Oct 11 2022, 10:14:40) [Clang 13.0.0 (clang-1300.0.29.30)]
(venv) richie.chen@MACC02CV6EGMD6R metadata-ingestion % datahub check plugins
Sources:
[2024-04-05 02:27:28,968] ERROR    {datahub.entrypoints:201} - Command failed: 'key already in use - athena'
Traceback (most recent call last):
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/entrypoints.py", line 188, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/telemetry/telemetry.py", line 454, in wrapper
    raise e
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/telemetry/telemetry.py", line 403, in wrapper
    res = func(*args, **kwargs)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/cli/check_cli.py", line 151, in plugins
    click.echo(source_registry.summary(verbose=verbose, col_width=25))
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 190, in summary
    self._materialize_entrypoints()
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 148, in _materialize_entrypoints
    self._load_entrypoint(entry_point_key)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 144, in _load_entrypoint
    self.register_lazy(entry_point.name, entry_point.value)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 107, in register_lazy
    self._register(key, import_path)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 97, in _register
    raise KeyError(f"key already in use - {key}")
KeyError: 'key already in use - athena'```

Are you trying to add an athena⁣ source? Have you edited setup.py⁣ and added one, because if so, it’s going to conflict with our existing one and you should name it something else. If neither, perhaps you can try checking out the most recent commit on the master⁣ branch and seeing if you can build the datahub CLI locally then. It’s hard to pinpoint the exact error without knowing what code changes you’ve made. FWIW, I just checked out master, deleted my venv, ran the steps in the doc to setup python environment, and was able to successfully run datahub check plugins⁣ .

<@U04N9PYJBEW> I didn’t make any changes in my case. I cloned a fresh copy of the repo and ran all the steps. But it is reassuring that it works for you so I can narrow the problem down to something in my local environment

Ahh you might need to run python -m datahub check plugins, your datahub executable might be from outside the venv

You can run which datahub while the venv is activated to be sure

when I’m outside the venv, the I don’t have any issues with datahub check plugins using the installed version. When I’m inside the venv, I am able to do the following:

DataHub CLI version: unavailable (installed in develop mode)
Python version: 3.10.8 (v3.10.8:aaaf517424, Oct 11 2022, 10:14:40) [Clang 13.0.0 (clang-1300.0.29.30)]```
which appears to be the develop mode version of the cli ( I think that's correct). I only run into issues using `datahub check plugins` in the venv with the develop mode cli. Let me try with `python -m` in venv and see what it says

I do get the same issue with python -m datahub check plugins

outside venv for reference:

DataHub CLI version: 0.13.1.2
Python version: 3.10.8 (v3.10.8:aaaf517424, Oct 11 2022, 10:14:40) [Clang 13.0.0 (clang-1300.0.29.30)]
richie.chen@MACC02CV6EGMD6R metadata-ingestion % datahub check plugins
Sources:


Sinks:


Transformers:


For details on why a plugin is disabled, rerun with '--verbose'
If a plugin is disabled, try running: pip install 'acryl-datahub[<plugin>]'```

output of which datahub in venv

/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/bin/datahub
(venv) richie.chen@MACC02CV6EGMD6R metadata-ingestion % git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean```

Hmm, that’s pretty odd, not sure what’s going on. <@U01GZEETMEZ> any thoughts here? There’s no way install with a different java version should have any affect here right?

here’s the full output after I cleaned up my java installation and deleted the venv directory and went through all the steps again (splitting for length)

java 17.0.10 2024-01-16 LTS
Java(TM) SE Runtime Environment (build 17.0.10+11-LTS-240)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.10+11-LTS-240, mixed mode, sharing)
richie.chen@MACC02CV6EGMD6R metadata-ingestion % python --version
Python 3.10.8
richie.chen@MACC02CV6EGMD6R metadata-ingestion % python3 --version
Python 3.10.8
richie.chen@MACC02CV6EGMD6R metadata-ingestion % pip --version
pip 24.0 from /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip (python 3.10)
richie.chen@MACC02CV6EGMD6R metadata-ingestion % pip3 --version
pip 24.0 from /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip (python 3.10)
richie.chen@MACC02CV6EGMD6R metadata-ingestion % git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
richie.chen@MACC02CV6EGMD6R metadata-ingestion % git rev-parse HEAD
e6dd0afcbf854f66328a1bfbd1057ee4e8fb085d
richie.chen@MACC02CV6EGMD6R metadata-ingestion % pwd 
/Users/richie.chen/code/external/datahub/metadata-ingestion```
To honour the JVM settings for this build a single-use Daemon process will be forked. See <https://docs.gradle.org/8.0.2/userguide/gradle_daemon.html#sec:disabling_the_daemon>.
Daemon will be stopped at the end of the build 
Configuration on demand is an incubating feature.

&gt; Task :li-utils:mainDestroyStaleFiles SKIPPED
:li-utils:mainDestroyStaleFiles task is a NO-OP task.

&gt; Task :li-utils:mainCopyPdscSchemas SKIPPED
:li-utils:mainCopyPdscSchemas task is a NO-OP task.

&gt; Task :metadata-events:mxe-schemas:mainDestroyStaleFiles SKIPPED
:metadata-events:mxe-schemas:mainDestroyStaleFiles task is a NO-OP task.

&gt; Task :metadata-events:mxe-schemas:mainCopyPdscSchemas SKIPPED
:metadata-events:mxe-schemas:mainCopyPdscSchemas task is a NO-OP task.

&gt; Task :metadata-events:mxe-schemas:changedFilesReport
Checking idl and snapshot files for changes...

&gt; Task :metadata-events:mxe-schemas:testDestroyStaleFiles SKIPPED
:metadata-events:mxe-schemas:testDestroyStaleFiles task is a NO-OP task.

&gt; Task :metadata-events:mxe-schemas:testCopyPdscSchemas SKIPPED
:metadata-events:mxe-schemas:testCopyPdscSchemas task is a NO-OP task.

&gt; Task :metadata-ingestion:environmentSetup
Requirement already satisfied: pip in ./venv/lib/python3.10/site-packages (22.2.2)
Collecting pip
  Using cached pip-24.0-py3-none-any.whl (2.1 MB)
Collecting uv
  Using cached uv-0.1.29-py3-none-macosx_10_12_x86_64.whl (10.5 MB)
Collecting wheel
  Using cached wheel-0.43.0-py3-none-any.whl (65 kB)
Requirement already satisfied: setuptools&gt;=63.0.0 in ./venv/lib/python3.10/site-packages (63.2.0)
Collecting setuptools&gt;=63.0.0
  Using cached setuptools-69.2.0-py3-none-any.whl (821 kB)
Installing collected packages: wheel, uv, setuptools, pip
  Attempting uninstall: setuptools
    Found existing installation: setuptools 63.2.0
    Uninstalling setuptools-63.2.0:
      Successfully uninstalled setuptools-63.2.0
  Attempting uninstall: pip
    Found existing installation: pip 22.2.2
    Uninstalling pip-22.2.2:
      Successfully uninstalled pip-22.2.2
Successfully installed pip-24.0 setuptools-69.2.0 uv-0.1.29 wheel-0.43.0
WARNING: There was an error checking the latest version of pip.

&gt; Task :metadata-ingestion:installPackageOnly
+ uv pip install -e .
Built 1 editable in 6.72s
Resolved 51 packages in 75ms
Installed 51 packages in 109ms
 + acryl-datahub==1!0.0.0.dev0 (from file:///Users/richie.chen/code/external/datahub/metadata-ingestion)
 + aiohttp==3.9.3
 + aiosignal==1.3.1
 + annotated-types==0.6.0
 + async-timeout==4.0.3
 + attrs==23.2.0
 + avro==1.11.3
 + avro-gen3==0.7.12
 + cached-property==1.5.2
 + certifi==2024.2.2
 + charset-normalizer==3.3.2
 + click==8.1.7
 + click-default-group==1.2.4
 + click-spinner==0.1.10
 + deprecated==1.2.14
 + docker==7.0.0
 + expandvars==0.12.0
 + frozenlist==1.4.1
 + humanfriendly==10.0
 + idna==3.6
 + ijson==3.2.3
 + jsonref==1.1.0
 + jsonschema==4.21.1
 + jsonschema-specifications==2023.12.1
 + mixpanel==4.10.1
 + multidict==6.0.5
 + mypy-extensions==1.0.0
 + packaging==24.0
 + progressbar2==4.4.2
 + psutil==5.9.8
 + pydantic==2.6.4
 + pydantic-core==2.16.3
 + python-dateutil==2.9.0.post0
 + python-utils==3.8.2
 + pyyaml==6.0.1
 + referencing==0.34.0
 + requests==2.31.0
 + requests-file==2.0.0
 + rpds-py==0.18.0
 + ruamel-yaml==0.18.6
 + ruamel-yaml-clib==0.2.8
 + sentry-sdk==1.44.1
 + six==1.16.0
 + tabulate==0.9.0
 + termcolor==2.4.0
 + toml==0.10.2
 + typing-extensions==4.11.0
 + typing-inspect==0.9.0
 + urllib3==2.2.1
 + wrapt==1.16.0
 + yarl==1.9.4
+ touch venv/.build_install_package_only_sentinel

&gt; Task :metadata-ingestion:installPackage
+ uv pip install -e .
Built 1 editable in 7.91s
Resolved 51 packages in 71ms
Installed 1 package in 5ms
 - acryl-datahub==1!0.0.0.dev0 (from file:///Users/richie.chen/code/external/datahub/metadata-ingestion)
 + acryl-datahub==1!0.0.0.dev0 (from file:///Users/richie.chen/code/external/datahub/metadata-ingestion)
+ touch venv/.build_install_package_sentinel```
+ uv pip install -e '.[dev]'
Built 1 editable in 8.74s
Resolved 398 packages in 647ms
Installed 354 packages in 4.02s
 - acryl-datahub==1!0.0.0.dev0 (from file:///Users/richie.chen/code/external/datahub/metadata-ingestion)
 + acryl-datahub==1!0.0.0.dev0 (from file:///Users/richie.chen/code/external/datahub/metadata-ingestion)
 + acryl-datahub-classify==0.0.10
 + acryl-pyhive==0.6.16
 + acryl-sqlglot==23.2.1.dev5
 + aenum==3.1.15
 + alembic==1.13.1
 + altair==4.2.0
 + anyio==4.3.0
 + appdirs==1.4.4
 + appnope==0.1.4
 + argon2-cffi==23.1.0
 + argon2-cffi-bindings==21.2.0
 + asn1crypto==1.5.1
 + asttokens==2.4.1
 + asynch==0.2.3
 + beautifulsoup4==4.12.3
 + black==22.12.0
 + bleach==6.1.0
 + blinker==1.7.0
 + blis==0.7.11
 + boto3==1.34.79
 + boto3-stubs==1.28.15
 + botocore==1.34.79
 + botocore-stubs==1.34.69
 + bowler==0.9.0
 + bracex==2.4
 + build==1.2.1
 + cachetools==5.3.3
 + catalogue==2.0.10
 + cattrs==23.2.3
 + cffi==1.16.0
 + chardet==5.2.0
 + ciso8601==2.3.1
 + clickhouse-driver==0.2.7
 + clickhouse-sqlalchemy==0.2.4
 + cloudpickle==3.0.0
 + colorama==0.4.6
 + comm==0.2.2
 + confection==0.1.4
 + confluent-kafka==2.3.0
 + coverage==7.4.4
 + cryptography==42.0.5
 + cx-oracle==8.3.0
 + cymem==2.0.8
 + dask==2024.4.1
 + databricks-dbapi==0.6.0
 + databricks-sdk==0.24.0
 + databricks-sql-connector==2.9.5
 + dataflows-tabulator==1.54.3
 + db-dtypes==1.2.0
 + debugpy==1.8.1
 + decorator==5.1.1
 + deepdiff==6.7.1
 + defusedxml==0.7.1
 + deltalake==0.16.4
 + dill==0.3.8
 + docutils==0.20.1
 + duckdb==0.10.1
 + ecdsa==0.18.0
 + elasticsearch==7.13.4
 + entrypoints==0.4
 + et-xmlfile==1.1.0
 + exceptiongroup==1.2.0
 + executing==2.0.1
 + faker==24.7.1
 + fastapi==0.99.1
 + fastavro==1.9.4
 + fastjsonschema==2.19.1
 + feast==0.35.0
 + filelock==3.13.3
 + fissix==21.11.13
 + flake8==7.0.0
 + flake8-bugbear==23.3.12
 + flake8-tidy-imports==4.10.0
 + flask==3.0.2
 + flask-openid==1.3.0
 + flatdict==4.0.1
 + freezegun==1.4.0
 + fsspec==2023.12.2
 + future==1.0.0
 + geoalchemy2==0.14.7
 + gitdb==4.0.11
 + gitpython==3.1.43
 + google-api-core==2.18.0
 + google-auth==2.29.0
 + google-cloud-appengine-logging==1.4.3
 + google-cloud-audit-log==0.2.5
 + google-cloud-bigquery==3.20.1
 + google-cloud-core==2.4.1
 + google-cloud-datacatalog-lineage==0.2.2
 + google-cloud-logging==3.5.0
 + google-crc32c==1.5.0
 + google-resumable-media==2.7.0
 + googleapis-common-protos==1.63.0
 + great-expectations==0.15.50
 + greenlet==3.0.3
 + grpc-google-iam-v1==0.13.0
 + grpcio==1.62.1
 + grpcio-health-checking==1.62.1
 + grpcio-reflection==1.62.1
 + grpcio-status==1.62.1
 + grpcio-tools==1.62.1
 + gssapi==1.8.3
 + gunicorn==21.2.0
 + h11==0.14.0
 + httpcore==1.0.5
 + httptools==0.6.1
 + httpx==0.27.0
 + importlib-metadata==6.11.0
 + importlib-resources==6.4.0
 + iniconfig==2.0.0
 + ipaddress==1.0.23
 + ipykernel==6.17.1
 + ipython==8.21.0
 + ipython-genutils==0.2.0
 + ipywidgets==8.1.2
 + iso3166==2.1.1
 + isodate==0.6.1
 + isort==5.13.2
 + itsdangerous==2.1.2
 + jaraco-classes==3.4.0
 + jaraco-context==5.1.0
 + jaraco-functools==4.0.0
 + jedi==0.19.1
 + jinja2==3.1.3
 + jmespath==1.0.1
 + jpype1==1.5.0
 + jsonlines==4.0.0
 + jsonpatch==1.33
 + jsonpickle==3.0.3
 + jsonpointer==2.4
 + jupyter-client==7.4.9
 + jupyter-core==4.12.0
 + jupyter-server==1.16.0
 + jupyterlab-pygments==0.3.0
 + jupyterlab-widgets==3.0.10
 + keyring==25.1.0
 + langcodes==3.3.0
 + lark==1.1.4
 + leb128==1.0.7
 + linear-tsv==1.1.0
 + lkml==1.3.4
 + locket==1.0.0
 + looker-sdk==23.0.0
 + lxml==5.2.1
 + lz4==4.3.3
 + makefun==1.15.2
 + mako==1.3.2
 + markdown-it-py==3.0.0
 + markupsafe==2.1.5
 + marshmallow==3.21.1
 + matplotlib-inline==0.1.6
 + mccabe==0.7.0
 + mdurl==0.1.2
 + mistune==3.0.2
 + mlflow-skinny==2.11.3
 + mmh3==4.1.0
 + mmhash3==3.0.1
 + more-itertools==10.2.0
 + moreorless==0.4.0
 + moto==4.2.14
 + msal==1.22.0
 + murmurhash==1.0.10
 + mypy==1.0.0
 + mypy-boto3-dynamodb==1.28.73
 + mypy-boto3-glue==1.28.77
 + mypy-boto3-s3==1.28.55
 + mypy-boto3-sagemaker==1.28.15
 + mypy-boto3-sts==1.28.58
 + mypy-protobuf==3.1.0
 + nbclassic==1.0.0
 + nbclient==0.6.3
 + nbconvert==7.16.3
 + nbformat==5.10.4
 + nest-asyncio==1.6.0
 + networkx==3.2.1
 + nh3==0.2.17
 + notebook==6.5.6
 + notebook-shim==0.2.4
 + numpy==1.24.4
 + oauthlib==3.2.2
 + okta==1.7.0
 + openpyxl==3.1.2
 + ordered-set==4.1.0
 - packaging==24.0
 + packaging==23.2
 + pandas==1.5.3
 + pandavro==1.5.2
 + pandocfilters==1.5.1
 + parse==1.20.1
 + parso==0.8.4
 + partd==1.4.1
 + pathlib-abc==0.1.1
 + pathspec==0.12.1
 + pathy==0.11.0
 + pendulum==3.0.0
 + pexpect==4.9.0
 + phonenumbers==8.13.0
 + pkginfo==1.10.0
 + platformdirs==3.11.0
 + pluggy==1.4.0
 + preshed==3.0.9
 + prometheus-client==0.20.0
 + prompt-toolkit==3.0.43
 + proto-plus==1.23.0
 + protobuf==4.23.3
 + psycopg2-binary==2.9.9
 + ptyprocess==0.7.0
 + pure-eval==0.2.2
 + pure-sasl==0.6.2
 + py-partiql-parser==0.5.0
 + py4j==0.10.9.5
 + pyarrow==15.0.2
 + pyarrow-hotfix==0.6
 + pyasn1==0.6.0
 + pyasn1-modules==0.4.0
 + pyathena==2.25.2
 + pycodestyle==2.11.1
 + pycountry==23.12.11
 + pycparser==2.22
 + pycryptodome==3.20.0
 - pydantic==2.6.4
 + pydantic==1.10.15
 + pydash==8.0.0
 + pydeequ==1.1.1
 + pydruid==0.6.6
 + pyflakes==3.2.0
 + pygments==2.17.2
 + pyiceberg==0.4.0
 + pyjwt==2.8.0
 + pymysql==1.1.0
 + pyopenssl==24.1.0
 + pyparsing==3.0.9
 + pyproject-hooks==1.0.0
 + pyspark==3.3.4
 + pyspnego==0.10.2
 + pytest==8.1.1
 + pytest-asyncio==0.23.6
 + pytest-cov==5.0.0
 + pytest-docker==3.1.1
 + pytest-random-order==1.1.1
 + python-dotenv==1.0.1
 + python-jose==3.3.0
 + python-ldap==3.4.4
 + python-stdnum==1.20
 + python3-openid==3.2.0
 + pytz==2024.1
 + pyzmq==24.0.1
 + readme-renderer==43.0
 + redash-toolbelt==0.1.9
 + redshift-connector==2.1.0
 + regex==2023.12.25
 + requests-gssapi==1.3.0
 + requests-mock==1.12.1
 + requests-ntlm==1.2.0
 + requests-toolbelt==1.0.0
 + responses==0.25.0
 + rfc3986==2.0.0
 + rich==13.7.1
 + rsa==4.9
 - ruamel-yaml==0.18.6
 + ruamel-yaml==0.17.17
 + s3transfer==0.10.1
 + schwifty==2024.1.1.post0
 + scipy==1.13.0
 + scramp==1.4.4
 + send2trash==1.8.2
 + simple-salesforce==1.12.5
 + slack-sdk==3.18.1
 + smart-open==6.4.0
 + smmap==5.0.1
 + sniffio==1.3.1
 + snowflake-connector-python==3.7.1
 + snowflake-sqlalchemy==1.5.1
 + sortedcontainers==2.4.0
 + soupsieve==2.5
 + spacy==3.5.0
 + spacy-legacy==3.0.12
 + spacy-loggers==1.0.5
 + sql-metadata==2.2.2
 + sqlalchemy==1.4.44
 + sqlalchemy-bigquery==1.10.0
 + sqlalchemy-redshift==0.8.14
 + sqlalchemy2-stubs==0.0.2a38
 + sqllineage==1.3.8
 + sqlparse==0.4.4
 + srsly==2.4.8
 + stack-data==0.6.3
 + starlette==0.27.0
 + strictyaml==1.7.3
 + tableauserverclient==0.25
 + tableschema==1.20.11
 + tenacity==8.2.3
 + teradatasql==20.0.0.9
 + teradatasqlalchemy==20.0.0.0
 + terminado==0.18.1
 + thinc==8.1.12
 + thrift==0.16.0
 + thrift-sasl==0.4.3
 + time-machine==2.14.1
 + tinycss2==1.2.1
 + tomli==2.0.1
 + tomlkit==0.12.4
 + toolz==0.12.1
 + tornado==6.4
 + tqdm==4.66.2
 + traitlets==5.2.1.post0
 + trino==0.328.0
 + twine==5.0.0
 + typeguard==2.13.3
 + typer==0.7.0
 + types-awscrt==0.20.5
 + types-cachetools==5.3.0.7
 + types-click==0.1.12
 + types-click-spinner==0.1.13.20240311
 + types-dataclasses==0.6.6
 + types-deprecated==1.2.9.20240311
 + types-pkg-resources==0.1.3
 + types-protobuf==4.24.0.20240311
 + types-pymysql==1.1.0.1
 + types-pyopenssl==24.0.0.20240311
 + types-python-dateutil==2.9.0.20240316
 + types-pytz==2024.1.0.20240203
 + types-pyyaml==6.0.12.20240311
 + types-requests==2.31.0.3
 + types-s3transfer==0.10.0
 + types-six==1.16.21.20240311
 + types-tabulate==0.9.0.20240106
 + types-termcolor==1.1.6.2
 + types-toml==0.10.8.20240310
 + types-ujson==5.9.0.0
 + types-urllib3==1.26.25.14
 + tzdata==2024.1
 + tzlocal==5.2
 + ujson==5.9.0
 + unicodecsv==0.14.1
 - urllib3==2.2.1
 + urllib3==1.26.18
 + uvicorn==0.29.0
 + uvloop==0.19.0
 + vertica-python==1.3.8
 + vertica-sqlalchemy-dialect==0.0.8.1
 + vininfo==1.8.0
 + volatile==2.1.0
 + wasabi==1.1.2
 + watchfiles==0.21.0
 + wcmatch==8.5.1
 + wcwidth==0.2.13
 + webencodings==0.5.1
 + websocket-client==1.7.0
 + websockets==12.0
 + werkzeug==3.0.2
 + widgetsnbextension==4.0.10
 + xlrd==2.0.1
 + xmltodict==0.13.0
 + zeep==4.2.1
 + zipp==3.18.1
 + zstd==1.5.5.1
+ touch venv/.build_install_dev_sentinel

BUILD SUCCESSFUL in 54s
33 actionable tasks: 11 executed, 1 from cache, 21 up-to-date
richie.chen@MACC02CV6EGMD6R metadata-ingestion % source venv/bin/activate
(venv) richie.chen@MACC02CV6EGMD6R metadata-ingestion % datahub version
DataHub CLI version: unavailable (installed in develop mode)
Python version: 3.10.8 (v3.10.8:aaaf517424, Oct 11 2022, 10:14:40) [Clang 13.0.0 (clang-1300.0.29.30)]
(venv) richie.chen@MACC02CV6EGMD6R metadata-ingestion % datahub check plugins
Sources:
[2024-04-05 15:06:51,534] ERROR    {datahub.entrypoints:201} - Command failed: 'key already in use - athena'
Traceback (most recent call last):
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/entrypoints.py", line 188, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/telemetry/telemetry.py", line 454, in wrapper
    raise e
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/telemetry/telemetry.py", line 403, in wrapper
    res = func(*args, **kwargs)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/cli/check_cli.py", line 151, in plugins
    click.echo(source_registry.summary(verbose=verbose, col_width=25))
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 190, in summary
    self._materialize_entrypoints()
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 148, in _materialize_entrypoints
    self._load_entrypoint(entry_point_key)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 144, in _load_entrypoint
    self.register_lazy(entry_point.name, entry_point.value)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 107, in register_lazy
    self._register(key, import_path)
  File "/Users/richie.chen/code/external/datahub/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 97, in _register
    raise KeyError(f"key already in use - {key}")
KeyError: 'key already in use - athena'```

I think I found what the issue is. This issue summarizes it nicely: https://github.com/pypa/setuptools/issues/3649

There’s some issue causing the keys to be duplicated which is why it says key already in use because it’s trying to register the duplicate keys. Deleting the .egg-info file fixed the error for me. I think it worked for you because you might be on py3.8 or py3.9 which uses importlib_metadata instead of importlib.metadata.