Troubleshooting Custom Ingestion Source Integration in Self-Deployed DataHub App

Original Slack Thread

Hi all, I am trying to add a demo custom ingestion source to my self deployed datahub app. As of now I have followed relevant docus and floowed advise from <@U0348BYAS56> and <@U05AW4DVBAA> but it is not possible for me to correctly implement a custom ingestion source.

What I have beed doing is to

From within this dir: datahub/datahub-actions (both I have git cloned)

  1. I created a folder custom-metadata-ingestion folder with this tree
├── build
│  ├── bdist.macosx-13-x86_64
│  └── lib
│    └── matillion
│      ├── __init__.py
│      └── matillion.py
├── dist
│  ├── matillion-1.0-py3-none-any.whl
│  └── matillion-1.0.tar.gz
├── matillion
│  ├── __init__.py
│  ├── __pycache__
│  │  └── __init__.cpython-310.pyc
│  └── src
│    ├── __init__.py
│    ├── matillion
│    │  ├── __init__.py
│    │  └── matillion.py
│    └── matillion.egg-info
│      ├── PKG-INFO
│      ├── SOURCES.txt
│      ├── dependency_links.txt
│      ├── not-zip-safe
│      ├── requires.txt
│      └── top_level.txt
└── setup.py```
I have created a python package out of it with this setup.py
```import setuptools

setuptools.setup(
    name="matillion",
    version="1.0",
    description="A custom framework to ingest metadata into DataHub",
    zip_safe=False,
    install_requires=["virtualenv", "build", "acryl-datahub==0.11.0"],
    packages=setuptools.find_packages(where='matillion/src'),
    package_dir={
        '': 'matillion/src'
    },
)```
My custom ingestion source is fully based on this: <https://github.com/acryldata/meta-world/tree/master/custom_sources>

*2. I have adapted the Dockerfile from datahub-actions like in my screenshot. Then I have build a pushed it.*
*3. I have tested whether my matillion package is working by running my image interactively and successfully importing my matillion package.*
*4. I have exchanged the datahub-actions image for my custom image*

```datahub-actions:
    container_name: datahub-actions
    hostname: actions
    image: f43nris/matillion-custom-metadata-source:latest
    env_file: datahub-actions/env/docker.env
    depends_on:
      datahub-gms:
        condition: service_healthy```
and re run  datahub docker quickstart --quickstart-compose-file docker/docker-compose-alternative.yml

Still, when i open the datahub cli and add a custom ingestion source

```source:
    type: matillion.matillion.MyCustomSource
sink:
    type: datahub-rest
    config:
        server: '<http://datahub-gms:8080>'```
i just get the “Pending” status.

Can anybody help me asap? I am working on this for so long already.![attachment](https://files.slack.com/files-pri/TUMKD5EGJ-F05S9RV6K9T/bildschirmfoto_2023-09-14_um_16.32.50.png?t=xoxe-973659184562-6705490291811-6708051934148-dd1595bd5f63266bc09e6166373c7a3c)![attachment](https://files.slack.com/files-pri/TUMKD5EGJ-F05SCQDM8RG/image.png?t=xoxe-973659184562-6705490291811-6708051934148-dd1595bd5f63266bc09e6166373c7a3c)

Hi David! Thanks for such a detailed breakdown of what you have tried :slight_smile: <@UV14447EU> any ideas of what might be happening here?

What I suspect is that the datahub actions container is not able to consume the “run ingestion” events. It is necessary to analyze the Docker environment, the datahub actions container logs and check whether the components are communicating correctly.

I don’t usually run docker compose with quickstart, but it is necessary to ensure that the custom datahub actions container is within the same network abstraction as the other containers. If not, I believe it will not be able to consume the broker events or make calls to the GMS.