Support for Adding Assertions at Pipeline and Dataflow Level

Original Slack Thread

Do we have support for adding assertions for pipeline, dataflow level ?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To provide the best assistance, could you clarify the following details?

  1. Are you using the UI or CLI for ingestion?
  2. Which version of DataHub are you using? (e.g., 0.12.0)
  3. What data sources are you integrating with DataHub? (e.g., BigQuery, Snowflake)

As for your question on adding assertions for pipeline or dataflow levels, the provided data sources do not explicitly mention support for assertions at the pipeline or dataflow level. Assertions in DataHub are generally focused on datasets and columns. For instance, you can create <|column assertions> and <|custom SQL assertions>.

If you are looking to monitor data quality at the pipeline or dataflow level, you might need to integrate with third-party tools like Great Expectations or DBT tests, which can then publish the results back to DataHub. This approach is suggested in a <Slack conversation> where it was mentioned that Open Source DataHub does not run assertions natively and requires external tools for evaluation.

Would you like more detailed information or examples on how to integrate and publish results from third-party tools to DataHub?