Deduplication of Kafka Topics in DataHub through Transformer or Ingestion Property

Original Slack Thread

Hi DataHub,
I am currently running Kafka CLI-based ingestion. When breaking changes are required, we create a new version of a topic. Eg: contract_master-entity_v1contract_master-entity_v2 Both topics are ingested but only the latest should be present in DataHub as things such as avro schema tags, owners, etc… are pushed only to the latest schema version in the Schema Registry. Is there anyway to dedup based on a pattern or pre-combination logic? If not, would a new transformer be a good way to approach it? Or would it rather be a property of the Kafka ingestion? Happy to contribute with a PR. Thanks!
DataHub: v.0.13.0

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)