Understanding Case Sensitivity and Configuration Options in Vertica Data Ingestion

user-3 · October 7, 2024, 12:05am

Hi <@U06TM7M40RK>, how does case sensitivity play a role when ingesting a specific view from a specific schema in vertica?

user-1 · October 7, 2024, 12:05am

Case sensitivity can significantly impact the ingestion process in DataHub, especially when dealing with specific views from specific schemas in Vertica. According to the <Vertica | DataHub ingestion documentation>, DataHub provides configuration options to handle case sensitivity during pattern matching for various entities, including tables and views.

Key Points on Case Sensitivity in Vertica Ingestion:

Regex Patterns for Views:
- You can specify regex patterns to filter views during ingestion using the view_pattern configuration. This pattern should match the entire view name in the format database.schema.view.
- The ignoreCase boolean parameter within view_pattern determines whether case sensitivity should be ignored during pattern matching. By default, this is set to True, meaning case sensitivity is ignored.
Configuration Example:
- To ingest views from a specific schema while ignoring case sensitivity, you can configure the view_pattern as follows:

view_pattern:
  allow: [ 'Customer.public.customer_view.*' ]
  deny: [ ]
  ignoreCase: True

This configuration will include all views starting with customer_view in the public schema of the Customer database, regardless of the case.

Impact on Column Level Lineage:
- As per the <Issues · datahub-project/datahub · GitHub issue on SQL Column Level Lineage Case Sensitivity>, case sensitivity can affect column-level lineage. If column names contain uppercase characters and the convert_urns_to_lowercase option is enabled, it may lead to mismatches in lineage.

Would you like more detailed information on configuring these patterns or addressing specific issues related to case sensitivity in your ingestion process?

Sources:

<Vertica | DataHub Ingestion Documentation>
<Issues · datahub-project/datahub · GitHub Issue on SQL Column Level Lineage Case Sensitivity> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-3 · October 7, 2024, 12:05am

What does convert_urns_to_lowercase option do in an ingestion recipe <@U06TM7M40RK>?

user-1 · October 7, 2024, 12:05am

The convert_urns_to_lowercase option in an ingestion recipe is used to ensure that all URNs (Uniform Resource Names) are converted to lowercase during the ingestion process. This can help avoid issues related to case sensitivity, such as duplicate entries or mismatches in lineage and metadata.

Key Points on convert_urns_to_lowercase:

Purpose:
- The primary purpose of this option is to standardize the casing of URNs across different ingestion sources, which can help prevent inconsistencies and conflicts.
Configuration:
- This option can be set in the ingestion recipe for various data sources. When enabled, it converts all dataset URNs to lowercase.
Example Usage:
- Here is an example of how to use the convert_urns_to_lowercase option in a DBT ingestion recipe:

source:
  type: dbt
  config:
    manifest_path: /path/to/manifest.json
    catalog_path: /path/to/catalog.json
    target_platform: bigquery
    convert_urns_to_lowercase: true

Similarly, for a BigQuery ingestion recipe:

source:
  type: bigquery
  config:
    project_id: your_project_id
    convert_urns_to_lowercase: true

Impact:
- Enabling this option ensures that all URNs are consistently lowercased, which can help avoid issues where the same entity might be ingested with different cases (e.g., Customer vs. customer).

References:

<dbt | DataHub DBT Ingestion Configuration>
<BigQuery | DataHub BigQuery Ingestion Configuration>
<https://github.com/datahub-project/datahub/pull/7418|GitHub Pull Request #7418>

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

<https://github.com/datahub-project/datahub/pull/7418|GitHub Pull Request #7418> 0 button 0 button Hint: Mention @runllm in the thread for followups.

datahub_team · October 7, 2024, 12:05am

Hi ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look

Hope this helps ! ><

Topic		Replies	Views
Handling Duplicate Datasets and Snowflake Job Hangs in DataHub Ingestion ingestion	7	5	November 4, 2024
Building Regex Expression to Exclude `ml__` Views in DataHub Ingestion Recipe ingestion	1	14	September 2, 2024
Resolving Uppercase and Lowercase Mix-up in Snowflake and dbt-cloud Platforms ingestion	11	8	January 13, 2025
Handling Ingestion from Mariadb Server with Uppercase Database Names ingestion	2	31	March 4, 2024
Investigating Behavioral Issue in Trino Ingestion with Table and Schema Filters ingestion	4	3	October 28, 2024

Understanding Case Sensitivity and Configuration Options in Vertica Data Ingestion

Related topics