Hi, I want to ingest our Redshift DataWarehouse Data. I am using Quickstart and the following recipe:
source:
type: redshift
config:
host_port: ‘…’
username: datahub_user
include_table_lineage: true
include_tables: true
include_views: true
is_serverless: true
profiling:
enabled: true
profile_table_level_only: true
stateful_ingestion:
enabled: true
password: ‘${Password_datahub_user_prod}’
database: nowdwh-prod-gold
table_lineage_mode: mixed
But the ingestion only recognises schemas entities of dwh and staging. The Datamarts have also tables, but they are not ingested/recognized by datahub (see picture attached). I am not really getting an error, so I do not know what to do…
To address the issue of certain schemas not being recognized during the ingestion process, there are a few potential areas to investigate:
Schema Pattern Configuration: Ensure that your ingestion recipe includes the correct schema patterns. If you want to include specific schemas, you can use the schema_pattern configuration to explicitly allow them. For example:
source:
type: redshift
config:
host_port: '....'
username: datahub_user
include_table_lineage: true
include_tables: true
include_views: true
is_serverless: true
profiling:
enabled: true
profile_table_level_only: true
stateful_ingestion:
enabled: true
password: '${Password_datahub_user_prod}'
database: nowdwh-prod-gold
table_lineage_mode: mixed
schema_pattern:
allow:
- 'dwh'
- 'staging'
- 'datamarts' # Add this line to include the 'datamarts' schema
Permissions: Verify that the datahub_user has the necessary permissions to access the datamarts schema. Lack of permissions could prevent DataHub from recognizing the tables within that schema.
Ingestion Logs: Check the ingestion logs for any warnings or messages that might indicate why the datamarts schema is not being ingested. This can provide clues if there are connectivity issues or misconfigurations.
Version Compatibility: Ensure that the version of DataHub you are using is compatible with the features you are trying to use, such as stateful ingestion and profiling.
If you would like more detailed steps or further assistance on a specific part of the process, please let me know!
Sources:
<Slack Slack Discussion> 0 button 0 button Hint: Mention @runllm in the thread for followups.
If the schema is recognized but the entities (such as views) within it are not, and there are no errors or permission issues in the logs, consider the following troubleshooting steps:
Include Views Configuration: Double-check that the include_views configuration is set to true in your ingestion recipe, which you have already done. Ensure that there are no additional filters or patterns that might be excluding views.
View Pattern Configuration: If you have specific patterns for views, ensure that they are correctly configured. You can use the view_pattern configuration to explicitly allow views from the datamarts schema. For example:
Check for Schema Binding: If the views are created with NO SCHEMA BINDING, this might cause issues with metadata extraction. According to a <Issues · datahub-project/datahub · GitHub issue>, views created with NO SCHEMA BINDING may not have their column information fetched correctly. Ensure that the views are not created with this option, or adjust permissions as needed.
Debugging and Logging: Since the logs do not show errors, consider increasing the verbosity of the logs to debug further. You can set the DATAHUB_DEBUG environment variable to true to get more detailed logs: