Finding Python Libraries used for Metadata Ingestion in DataHub Version 0.14.0.2

Original Slack Thread

<@U06TM7M40RK> datahub 0.14.0.2 를 사용중이고, metadata-ingestion에서 사용하는 python라이브러리 리스트 확인하는 방법은?

To check the list of Python libraries used in metadata ingestion for DataHub version 0.14.0.2, you can refer to the setup.py file in the DataHub repository. This file contains the dependencies required for metadata ingestion.

Here is a step-by-step guide to find the list of Python libraries:

  1. Navigate to the setup.py file:
    The setup.py file is located in the metadata-ingestion directory of the DataHub repository. You can view it directly on GitHub https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/setup.py|here.

  2. Check the install_requires section:
    The install_requires section in the setup.py file lists all the Python libraries required for metadata ingestion.

Here is an example snippet from the setup.py file:

setuptools.setup(
    # Package metadata.
    name=package_metadata["__package_name__"],
    version=_version,
    url="<https://datahubproject.io/>",
    project_urls={
        "Documentation": "<https://datahubproject.io/docs/>",
        "Source": "<https://github.com/datahub-project/datahub>",
        "Changelog": "<https://github.com/datahub-project/datahub/releases>",
        "Releases": "<https://github.com/acryldata/datahub/releases>",
    },
    license="Apache License 2.0",
    description="A CLI to work with DataHub metadata",
    long_description="""\
The `acryl-datahub` package contains a CLI and SDK for interacting with DataHub,
as well as an integration framework for pulling/pushing metadata from external systems.

See the [DataHub docs](<https://datahubproject.io/docs/metadata-ingestion>).
""",
    long_description_content_type="text/markdown",
    classifiers=[
        "Development Status :: 5 - Production/Stable",
        "Programming Language :: Python",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3 :: Only",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
        "Intended Audience :: Developers",
        "Intended Audience :: Information Technology",
        "Intended Audience :: System Administrators",
        "License :: OSI Approved",
        "License :: OSI Approved :: Apache Software License",
        "Operating System :: Unix",
        "Operating System :: POSIX :: Linux",
        "Environment :: Console",
        "Environment :: MacOS X",
        "Topic :: Software Development",
    ],
    # Package info.
    zip_safe=False,
    python_requires="&gt;=3.8",
    package_dir={"": "src"},
    packages=setuptools.find_namespace_packages(where="./src"),
    package_data={
        "datahub": ["py.typed"],
        "datahub.metadata": ["schema.avsc"],
        "datahub.metadata.schemas": ["*.avsc"],
        "datahub.ingestion.source.powerbi": ["powerbi-lexical-grammar.rule"],
    },
    entry_points=entry_points,
    # Dependencies.
    install_requires=list(base_requirements | framework_common),
    extras_require={
        "base": list(framework_common),
        **{
            plugin: list(
                framework_common
                | (
                    plugin_common
                    if plugin
                    not in {
                        "airflow",
                        "datahub-rest",
                        "datahub-kafka",
                        "sync-file-emitter"
                    }
                )
            )
            for plugin in plugin_requirements
        },
    },
)

Would you like more detail on any specific part of this configuration or the list of libraries?

Sources:

<@U06TM7M40RK> 위 내용 한글로 알려줘.

DataHub 버전 0.14.0.2에서 metadata-ingestion에 사용되는 Python 라이브러리 리스트를 확인하려면 DataHub 저장소의 setup.py 파일을 참조하면 됩니다. 이 파일에는 metadata-ingestion에 필요한 종속성이 나열되어 있습니다.

다음은 Python 라이브러리 리스트를 확인하는 단계별 가이드입니다:

  1. setup.py 파일로 이동:
    setup.py 파일은 DataHub 저장소의 metadata-ingestion 디렉토리에 있습니다. GitHub에서 직접 https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/setup.py|여기에서 확인할 수 있습니다.

  2. install_requires 섹션 확인:
    setup.py 파일의 install_requires 섹션에는 metadata-ingestion에 필요한 모든 Python 라이브러리가 나열되어 있습니다.

다음은 setup.py 파일의 예시 코드입니다:

setuptools.setup(
    # 패키지 메타데이터.
    name=package_metadata["__package_name__"],
    version=_version,
    url="<https://datahubproject.io/>",
    project_urls={
        "Documentation": "<https://datahubproject.io/docs/>",
        "Source": "<https://githubhub.com/datahub-project/datahub>",
        "Changelog": "<https://github.com/datahub-project/datahub/releases>",
        "Releases": "<https://github.com/acryldata/datahub/releases>",
    },
    license="Apache License 2.0",
    description="A CLI to work with DataHub metadata",
    long_description="""\
The `acryl-datahub` package contains a CLI and SDK for interacting with DataHub,
as well as an integration framework for pulling/pushing metadata from external systems.

See the [DataHub docs](<https://datahubproject.io/docs/metadata-ingestion>).
""",
    long_description_content_type="text/markdown",
    classifiers=[
        "Development Status :: 5 - Production/Stable",
        "Programming Language :: Python",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3 :: Only",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
        "Intended Audience :: Developers",
        "Intended Audience :: Information Technology",
        "Intended Audience :: System Administrators",
        "License :: OSI Approved",
        "License :: OSI Approved :: Apache Software License",
        "Operating System :: Unix",
        "Operating System :: POSIX :: Linux",
        "Environment :: Console",
        "Environment :: MacOS X",
        "Topic :: Software Development",
    ],
    # 패키지 정보.
    zip_safe=False,
    python_requires="&gt;=3.8",
    package_dir={"": "src"},
    packages=setuptools.find_namespace_packages(where="./src"),
    package_data={
        "datahub": ["py.typed"],
        "datahub.metadata": ["schema.avsc"],
        "datahub.metadata.schemas": ["*.avsc"],
        "datahub.ingestion.source.powerbi": ["powerbi-lexical-grammar.rule"],
    },
    entry_points=entry_points,
    # 종속성.
    install_requires=list(base_requirements | framework_common),
    extras_require={
        "base": list(framework_common),
        **{
            plugin: list(
                framework_common
                | (
                    plugin_common
                    if plugin
                    not in {
                        "airflow",
                        "datahub-rest",
                        "datahub-kafka",
                        "sync-file-emitter"
                    }
                )
            )
            for plugin in plugin_requirements
        },
    },
)

더 자세한 정보가 필요하신가요?

출처:

Sources: