<@U06TM7M40RK> During dataset profiling can we specify a custom query ?
datahub version : v0.14.0.2
In DataHub, dataset profiling is typically performed using predefined queries to collect statistics such as row counts, column statistics, and other metrics. However, the ability to specify a custom query for profiling is not explicitly mentioned in the provided data sources.
According to the <https://datahubproject.io/docs/0.14.0/metadata-ingestion/docs/dev_guides/sql_profiles|SQL Profiling documentation>, profiling collects table-level and column-level statistics using SQL-based profilers that are enabled for SQL-based sources. The documentation does not mention the ability to specify custom queries for profiling.
If you are looking to perform custom profiling logic, you might consider using <https://github.com/datahub-project/datahub/blob/master/docs/managed-datahub/observe/custom-sql-assertions.md|Custom SQL Assertions>, which allow you to define custom SQL queries and conditions to evaluate data quality assertions. However, this is more focused on data quality checks rather than profiling.
Would you like more detailed information on how to set up Custom SQL Assertions or any other aspect of DataHub profiling?
Sources:
- <https://datahubproject.io/docs/0.14.0/metadata-ingestion/docs/dev_guides/sql_profiles|SQL Profiling documentation>
- <https://github.com/datahub-project/datahub/blob/master/docs/managed-datahub/observe/custom-sql-assertions.md|Custom SQL Assertions>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> custom queries for hive, trino as a source ?
The DataHub’s documentation I have ingested does not mention how to specify custom queries for Hive or Trino as a source during dataset profiling or ingestion. You might want to provide more context on what you aim to achieve with custom queries, such as specific metrics or transformations, to explore alternative solutions. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
By the way, I found these links that might be relevant to what you’re looking for. Let me know if this helps: - SQL Queries - DataHub - Customizing Search in DataHub - DataHub Hive Connector Integration
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><