Investigating Behavioral Issue in Trino Ingestion with Table and Schema Filters

Original Slack Thread

Hello Team - I would like to report a behavioral issue in trino ingestion. whenever we add a table name in table filter along with schema filter, we are noticing that it is not considering the table name. could you please check this issue on priority
let me know if you need more information on the same

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To address the issue you’re experiencing with Trino ingestion, where the table name is not being considered when using table and schema filters, we can look into a few potential areas based on the available data sources.

  1. Pattern Matching in Ingestion: According to the <Snowflake | DataHub documentation for Snowflake ingestion>, regex patterns are used to filter tables and schemas during ingestion. It’s important to ensure that the regex patterns specified for both the schema_pattern and table_pattern are correctly defined to match the intended tables and schemas.

  2. Known Issues and Fixes: There have been known issues with pattern matching in ingestion sources. For example, a <https://github.com/datahub-project/datahub/pull/9482|GitHub pull request> addressed a similar issue in Clickhouse ingestion, where the table_pattern was not working as expected. This might indicate a similar underlying issue with Trino ingestion.

  3. Behavioral Issues in Trino: There is a <Issues · datahub-project/datahub · GitHub GitHub issue> related to Trino ingestion, which discusses incorrect querying behavior when the underlying connector isn’t Hive. While this issue is closed, it highlights the importance of ensuring that the ingestion logic aligns with the specific connector being used.

To proceed, could you provide more details on the specific patterns you are using for the schema_pattern and table_pattern? Additionally, are there any error messages or logs that could provide more context on the issue?

Sources:

we are not using regex for table pattern instead we are using the exact table name directly. but still seeing that it is not picking table name to run the stats queries against that.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><