Enriching Data from Glue Catalog Source in DataHub

Original Slack Thread

<@U06TM7M40RK> How do I enrich the data from glue catalog source ?

To enrich data from a Glue catalog source in DataHub, you can follow these general steps:

  1. Configure the Ingestion Source: Set up your ingestion recipe to pull metadata from the Glue catalog. This involves specifying the Glue catalog as the source in your YAML configuration file.

  2. Enable Profiling: You can enable data profiling in your ingestion configuration to gather additional metadata about your datasets. This can include statistics like column distributions, null counts, etc.

  3. Use Custom Metadata: If you need to add custom metadata, you can use DataHub’s extensibility features to define and ingest custom metadata aspects.

  4. Leverage DataHub’s Lineage and Tagging Features: You can enrich your data by adding lineage information and tags to your datasets. This can be done through the DataHub UI or programmatically via APIs.

  5. Integrate with Other Data Sources: You can combine metadata from Glue with other data sources supported by DataHub to create a more comprehensive metadata repository.

For specific configuration details, you can refer to the <Iceberg | DataHub source configuration documentation> and the <Glue | DataHub Configuration Details>.

Would you like more detailed steps on any specific part of the process?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><