Troubleshooting Redshift Data Ingestion and Schema Recognition

Original Slack Thread

Hi, I want to ingest our Redshift DataWarehouse Data. I am using Quickstart and the following recipe:
source:
type: redshift
config:
host_port: ‘…’
username: datahub_user
include_table_lineage: true
include_tables: true
include_views: true
is_serverless: true
profiling:
enabled: true
profile_table_level_only: true
stateful_ingestion:
enabled: true
password: ‘${Password_datahub_user_prod}’
database: nowdwh-prod-gold
table_lineage_mode: mixed

But the ingestion only recognises schemas entities of dwh and staging. The Datamarts have also tables, but they are not ingested/recognized by datahub (see picture attached). I am not really getting an error, so I do not know what to do…attachment

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To address the issue of certain schemas not being recognized during the ingestion process, there are a few potential areas to investigate:

  1. Schema Pattern Configuration: Ensure that your ingestion recipe includes the correct schema patterns. If you want to include specific schemas, you can use the schema_pattern configuration to explicitly allow them. For example:

    source:
      type: redshift
      config:
        host_port: '....'
        username: datahub_user
        include_table_lineage: true
        include_tables: true
        include_views: true
        is_serverless: true
        profiling:
          enabled: true
          profile_table_level_only: true
        stateful_ingestion:
          enabled: true
        password: '${Password_datahub_user_prod}'
        database: nowdwh-prod-gold
        table_lineage_mode: mixed
        schema_pattern:
          allow:
            - 'dwh'
            - 'staging'
            - 'datamarts'  # Add this line to include the 'datamarts' schema
    
  2. Permissions: Verify that the datahub_user has the necessary permissions to access the datamarts schema. Lack of permissions could prevent DataHub from recognizing the tables within that schema.

  3. Ingestion Logs: Check the ingestion logs for any warnings or messages that might indicate why the datamarts schema is not being ingested. This can provide clues if there are connectivity issues or misconfigurations.

  4. Version Compatibility: Ensure that the version of DataHub you are using is compatible with the features you are trying to use, such as stateful ingestion and profiling.

If you would like more detailed steps or further assistance on a specific part of the process, please let me know!

Sources:

  • <Slack Slack Discussion> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> so it recognises the schema, but not the entites (views) that come with the schema
The logs do not show any errors or permission issues

If the schema is recognized but the entities (such as views) within it are not, and there are no errors or permission issues in the logs, consider the following troubleshooting steps:

  1. Include Views Configuration: Double-check that the include_views configuration is set to true in your ingestion recipe, which you have already done. Ensure that there are no additional filters or patterns that might be excluding views.

  2. View Pattern Configuration: If you have specific patterns for views, ensure that they are correctly configured. You can use the view_pattern configuration to explicitly allow views from the datamarts schema. For example:

    source:
      type: redshift
      config:
        host_port: '....'
        username: datahub_user
        include_table_lineage: true
        include_tables: true
        include_views: true
        is_serverless: true
        profiling:
          enabled: true
          profile_table_level_only: true
        stateful_ingestion:
          enabled: true
        password: '${Password_datahub_user_prod}'
        database: nowdwh-prod-gold
        table_lineage_mode: mixed
        schema_pattern:
          allow:
            - 'dwh'
            - 'staging'
            - 'datamarts'
        view_pattern:
          allow:
            - 'datamarts.*'  # Ensure views in the 'datamarts' schema are included
    
  3. Check for Schema Binding: If the views are created with NO SCHEMA BINDING, this might cause issues with metadata extraction. According to a <Issues · datahub-project/datahub · GitHub issue>, views created with NO SCHEMA BINDING may not have their column information fetched correctly. Ensure that the views are not created with this option, or adjust permissions as needed.

  4. Debugging and Logging: Since the logs do not show errors, consider increasing the verbosity of the logs to debug further. You can set the DATAHUB_DEBUG environment variable to true to get more detailed logs:

    export DATAHUB_DEBUG=true
    datahub ingest -c your_ingestion_recipe.yml
    
  5. Manual Metadata Emission: If the issue persists, you can manually emit metadata for the views using the DataHub Python library as a workaround.

Would you like more detailed steps on any of these points or further assistance on a specific part of the process?

Sources:

Perfect - might be the NO SCHEMA BINDING thing - thanks for the git link :slightly_smiling_face:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><