Troubleshooting API Endpoint for DataJobs Retrieval in DataHub

Original Slack Thread

Hi everyone,

I have a question about retrieving the inputs and outputs of an already defined dataJob. I assume the best way to do so is by using the GET API endpoint /entities/v1/latest. I noticed that this works fine to retrieve info about datasets and dataflows when passing their URN but it doesn’t seem to work for dataJobs. For reference, I created a dataJob with the following urn (which i can verify in the url of the datahub UI). Using this urn gives me a 500 result code when using the API: urn:li:dataJob:(urn:li:dataFlow:(Airflow,test_flow_id,DEV),TestETL)

The returned message is the following:
Failes to batch get entities with urns: [urn:li:dataJob:(urn:li:dataFlow:(Airflow,test_flow_id,DEV),TestETL)]

Could it be that the API doesn’t support the retrieval of dataJob objects?

Edit: I’m using version 0.12.0

Edit2: It seems that this issue only occurs when I want to retrieve the dataJobInfo aspect. This is the one that breaks the call. When I only retrieve dataJobInputOutput it works fine. How can this be the case? I think it has something to do with the type parameter when instantiating DataJobInfoClass. I use the plain string “AIRFLOW” but this seems to cause the issue?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

Hey Maarten! Thanks for your patience here - <@U01GCJKA8P9> might be able to provide some guidance!

Hi there! This appears to be a bug in how we are handling that Type field as you have mentioned. Having a type of AIRFLOW at the surface should not cause any problems. Looping in some folks who have worked on this API recently

In the meantime, do you mind providing the GMS logs related to this error? https://datahubproject.io/docs/how/extract-container-logs/