Custom properties reset to empty json on dataset ingestion - Seeking solutions and preventative measures

Original Slack Thread

Hi everyone, I just discovered a weird issue where the custom properties for a dataset get reset to an empty json every time that dataset gets ingested. Has anyone else encountered this issue, is it perhaps a known bug, and is there anything I can do to prevent this from happening? For context I am currently running datahub v0.10.4 and am ingesting a greenplum schema via a sqlalchemy recipe.

<@U01GZEETMEZ> Any idea on this?

Our ingestion sources kinda make an assumption that they’re fully responsible for setting custom properties, so they set that field regardless

The best solve depends on where your custom properties come from - if they’re from a programmatic thing, you can use/build a transformer to attach your custom properties during ingestion

Oh okay. Based on the presence of the <Custom Properties | DataHub guide for custom properties> I assumed that wasn’t the expected behavior. We are setting the properties programmatically via an external process, so a custom transformer could be a good idea. For a short term fix I’ve found that I can just add the built in simple_add_dataset_properties transformer to my recipe and set the semantics to PATCH to force the desired behavior. It does require setting a dummy property via the transformer though.

Thank you for the response!