Debugging Data-Ingestion Code for MySQL Data with Assistance on Creating a Pipeline Instance for Debugging

Original Slack Thread

Hello everyone, I want to debug data-ingestion code to make clear how the code works when ingestion of mysql data, but I can’t find the corresponding main program, only found mysql.py, can anyone help?

This is a good question… I also don’t find too much information regarding how to debug each different componets effectively… here is what i got for your reference. https://datahubproject.io/docs/next/metadata-ingestion/developing|ref . Please share with us if you are making any progress. :handshake:

:handshake:thanks <@U05B3RG1PD1>

The way I usually debug is the following:

• set a breakpoint in the connector you want to debug (e.g., mysql.py)
• define my recipe as a dictionary
• create a https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/run/pipeline.py#L160|Pipeline instance with this config
• call the pipeline’s run()
• Execute your script

Thank you very much for your answer <@U049WUH4155> , but how to create a https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/run/pipeline.py#L160|Pipeline instance with some config? Could you please give me an example?

I guess this example here perfectly suits your needs:

https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/programatic_pipeline.py#L4

Hi Tim <@U049WUH4155>, I successfully ingested data and send it to the GMS at the 8080 port by running this “porgramatic_pipeline.py”, but I want to know what part of the source code will be called when ingesting data through the UI interface.

I can’t point you to the concrete line of code here, but as far as I know the UI runs exactly such a pipeline with the UI input as recipe in the DataHub actions container.

thank you <@U049WUH4155>