Automating Data Ingestion and Tagging with DataHub's Transformer Feature

user-3 · March 4, 2024, 4:06pm

New to DataHub - is the GMS able to infer tags on ingestion or do I need to preload tags?

user-2 · March 4, 2024, 4:06pm

(coworker of Christian’s here) Also, is there any capability to automate data ingest/parse/cleansing along with tagging? I think the ‘transformer’ feature is what we might need for auto-tagging, but our boss seems convinced Datahub can also accomplish ingest/parse/cleansing & routing to the final repo destination, which I am not seeing.

user-1 · March 4, 2024, 4:06pm

Hi <@U059X6T0CSV> & <@U05N53BMLBT>! Great to have you with us in the Community :teamwork:

Here are the options we currently support for auto-ingesting tags:
• Extract from source during ingestion — for many of our ingestion sources (Airflow, Snowflake, dbt, etc.), we will automatically extract existing tags from the source system & apply them in DataHub. Our Source docs will have details about what is available for each source; for example, <dbt | DataHub how we extract tags/owners/etc. from dbt’s >meta<dbt | DataHub block>
• https://datahubproject.io/docs/metadata-ingestion/docs/transformer/intro/|Transformers - <@U05N53BMLBT> you’re exactly right - transformers are a way to auto-apply tags/terms/owners/etc. during ingestion if they don’t exist in Source
• Actions Framework - this is the most dynamic & customizable way for you to apply tags as your sources evolve; check out <https://datahubproject.io/docs/actions/guides/developing-an-action|this guide> and <https://www.youtube.com/watch?v=lrx8LFbe7w0|Hyejin’s demo> of what you can do with it!

Topic		Replies	Views
Setting Default Tags for Ingestion Sources in DataHub's Transformers getting-started	2	56	March 4, 2024
Automating Term Mapping in Entity Ingestion Process: Documentation Request ingestion	3	45	April 29, 2024
Managing and Editing Tags Assigned During Data Ingestion in DataHub ingestion	6	8	August 12, 2024
Troubleshooting Missing Domain Tags in Data Ingestion Recipe ingestion	6	8	October 14, 2024
Setting Domain, Tag, and Ownership for Data Ingestion Sources using Transformers in Web UI getting-started	3	71	March 4, 2024

Automating Data Ingestion and Tagging with DataHub's Transformer Feature

Related topics