Hello everyone, I want to debug data-ingestion code to make clear how the code works when ingestion of mysql data, but I can’t find the corresponding main program, only found mysql.py, can anyone help?
This is a good question… I also don’t find too much information regarding how to debug each different componets effectively… here is what i got for your reference. https://datahubproject.io/docs/next/metadata-ingestion/developing|ref . Please share with us if you are making any progress.
thanks <@U05B3RG1PD1>
The way I usually debug is the following:
• set a breakpoint in the connector you want to debug (e.g., mysql.py
)
• define my recipe as a dictionary
• create a https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/run/pipeline.py#L160|Pipeline instance with this config
• call the pipeline’s run()
• Execute your script
Thank you very much for your answer <@U049WUH4155> , but how to create a https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/run/pipeline.py#L160|Pipeline instance with some config? Could you please give me an example?
I guess this example here perfectly suits your needs:
Hi Tim <@U049WUH4155>, I successfully ingested data and send it to the GMS at the 8080 port by running this “porgramatic_pipeline.py”, but I want to know what part of the source code will be called when ingesting data through the UI interface.
I can’t point you to the concrete line of code here, but as far as I know the UI runs exactly such a pipeline with the UI input as recipe in the DataHub actions container.
thank you <@U049WUH4155>