Evaluating DataHub for CDP Platform: Ingestion Connectors Capabilities and Delta Updates Process

Original Slack Thread

:wave: Hello, team! I have a general question. We are evaluating DataHub for our CDP platform data product. I have a question about ingestion capabilities of DataHub connectors. We have Snowflake, S3 and RDS (MySQL) database. My question is: Do DataHub connectors detect CRUD operations, eg.:
new data assets (connection, databases, schemas, tables, columns and lineage)
updated data assets (connection, databases, schemas, tables, columns and lineage)
deleted data assets (connection, databases, schemas, tables, columns and lineage)
?
We build for the client another Data Governance project based on Atlan solution (based internally on Apache Atlas) and for most systems there are no connectors available, so we have to develop a metadata database layer where we crawler all metadata and generate Delta (eg. new to be created, existing to be updated, existing to be deleted for all data sets, we use Slowly changing dimension approach). My question is if DataHub connectors can process Delta updates out of the box.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

I am at moment only reading docs to understand the capabilities.

Ok, I have found answer to my question in docs: https://datahubproject.io/docs/metadata-ingestion/docs/dev_guides/stateful#stale-entity-removal