<@U06TM7M40RK> How do I enrich the data from glue catalog source ?
To enrich data from a Glue catalog source in DataHub, you can follow these general steps:
-
Configure the Ingestion Source: Set up your ingestion recipe to pull metadata from the Glue catalog. This involves specifying the Glue catalog as the source in your YAML configuration file.
-
Enable Profiling: You can enable data profiling in your ingestion configuration to gather additional metadata about your datasets. This can include statistics like column distributions, null counts, etc.
-
Use Custom Metadata: If you need to add custom metadata, you can use DataHub’s extensibility features to define and ingest custom metadata aspects.
-
Leverage DataHub’s Lineage and Tagging Features: You can enrich your data by adding lineage information and tags to your datasets. This can be done through the DataHub UI or programmatically via APIs.
-
Integrate with Other Data Sources: You can combine metadata from Glue with other data sources supported by DataHub to create a more comprehensive metadata repository.
For specific configuration details, you can refer to the <Iceberg | DataHub source configuration documentation> and the <Glue | DataHub Configuration Details>.
Would you like more detailed steps on any specific part of the process?
Sources:
- https://github.com/datahub-project/datahub/issues/4399|https://github.com/datahub-project/datahub/issues/4399
- https://datahubspace.slack.com/archives/C029A3M079U/p1723713743.718979|https://datahubspace.slack.com/archives/C029A3M079U/p1723713743.718979 0 button 0 button Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><