Hi <@U06TM7M40RK>, how does case sensitivity play a role when ingesting a specific view from a specific schema in vertica?
Case sensitivity can significantly impact the ingestion process in DataHub, especially when dealing with specific views from specific schemas in Vertica. According to the <Vertica | DataHub ingestion documentation>, DataHub provides configuration options to handle case sensitivity during pattern matching for various entities, including tables and views.
Key Points on Case Sensitivity in Vertica Ingestion:
-
Regex Patterns for Views:
- You can specify regex patterns to filter views during ingestion using the
view_pattern
configuration. This pattern should match the entire view name in the formatdatabase.schema.view
. - The
ignoreCase
boolean parameter withinview_pattern
determines whether case sensitivity should be ignored during pattern matching. By default, this is set toTrue
, meaning case sensitivity is ignored.
- You can specify regex patterns to filter views during ingestion using the
-
Configuration Example:
- To ingest views from a specific schema while ignoring case sensitivity, you can configure the
view_pattern
as follows:
- To ingest views from a specific schema while ignoring case sensitivity, you can configure the
view_pattern:
allow: [ 'Customer.public.customer_view.*' ]
deny: [ ]
ignoreCase: True
This configuration will include all views starting with customer_view
in the public
schema of the Customer
database, regardless of the case.
- Impact on Column Level Lineage:
- As per the <Issues · datahub-project/datahub · GitHub issue on SQL Column Level Lineage Case Sensitivity>, case sensitivity can affect column-level lineage. If column names contain uppercase characters and the
convert_urns_to_lowercase
option is enabled, it may lead to mismatches in lineage.
- As per the <Issues · datahub-project/datahub · GitHub issue on SQL Column Level Lineage Case Sensitivity>, case sensitivity can affect column-level lineage. If column names contain uppercase characters and the
Would you like more detailed information on configuring these patterns or addressing specific issues related to case sensitivity in your ingestion process?
Sources:
- <Vertica | DataHub Ingestion Documentation>
- <Issues · datahub-project/datahub · GitHub Issue on SQL Column Level Lineage Case Sensitivity>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
What does convert_urns_to_lowercase option do in an ingestion recipe <@U06TM7M40RK>?
The convert_urns_to_lowercase
option in an ingestion recipe is used to ensure that all URNs (Uniform Resource Names) are converted to lowercase during the ingestion process. This can help avoid issues related to case sensitivity, such as duplicate entries or mismatches in lineage and metadata.
Key Points on convert_urns_to_lowercase
:
-
Purpose:
- The primary purpose of this option is to standardize the casing of URNs across different ingestion sources, which can help prevent inconsistencies and conflicts.
-
Configuration:
- This option can be set in the ingestion recipe for various data sources. When enabled, it converts all dataset URNs to lowercase.
-
Example Usage:
- Here is an example of how to use the
convert_urns_to_lowercase
option in a DBT ingestion recipe:
- Here is an example of how to use the
source:
type: dbt
config:
manifest_path: /path/to/manifest.json
catalog_path: /path/to/catalog.json
target_platform: bigquery
convert_urns_to_lowercase: true
- Similarly, for a BigQuery ingestion recipe:
source:
type: bigquery
config:
project_id: your_project_id
convert_urns_to_lowercase: true
- Impact:
- Enabling this option ensures that all URNs are consistently lowercased, which can help avoid issues where the same entity might be ingested with different cases (e.g.,
Customer
vs.customer
).
- Enabling this option ensures that all URNs are consistently lowercased, which can help avoid issues where the same entity might be ingested with different cases (e.g.,
References:
- <dbt | DataHub DBT Ingestion Configuration>
- <BigQuery | DataHub BigQuery Ingestion Configuration>
- <https://github.com/datahub-project/datahub/pull/7418|GitHub Pull Request #7418>
Would you like more detailed steps or further assistance on any specific part of this process?
Sources:
- <https://github.com/datahub-project/datahub/pull/7418|GitHub Pull Request #7418>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><