I am trying to use SQL view in Snowflake and ingest sample data in Datahub. Column names was successfully extracted, but I still don’t see sample data in Datahub.
There is my config below:
type: snowflake
config:
account_id: <account_id>
include_views: true
profiling:
enabled: true
sample_size: 10
warehouse: <warehouse>
username: <username>
password: <password>
enable_stateful_profiling: false
role: ACCOUNTADMIN
database_pattern:
allow:
- '<pattern>*'
schema_pattern:
allow:
- '<pattern>*'```
I am using UI and Datahub version is `0.13.1.3`.
Thanks !
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
Are you using UI or CLI for ingestion?
Which DataHub version are you using? (e.g. 0.12.0)
What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
To better assist you, could you please provide the following information?
Are you using the UI or CLI for ingestion?
Which DataHub version are you using? (e.g., 0.12.0)
What data source(s) are you integrating with DataHub? (e.g., BigQuery)
This information will help in diagnosing the issue more effectively.
Additionally, based on your provided configuration, it seems like you have enabled profiling and included views, but you might be missing some specific configurations for profiling views. According to <Slack Slack thread>, profiling for views might require some code modifications in the SnowflakeProfiler class.
Would you like more detailed guidance on how to modify the code to enable profiling for views?
Sources:
<Slack Thread on Profiling Views> 0 button 0 button Hint: Mention @runllm in the thread for followups.
Given your configuration and the issue you’re facing, it seems like the profiling for views might not be working as expected. Here are a few steps you can take to troubleshoot and potentially resolve the issue:
Check DataHub Version: Ensure you are using a compatible version of DataHub that supports the features you need. Profiling for views might have specific requirements or bugs fixed in later versions.
Enable Debug Logging: Enable debug logging to get more detailed insights into what might be going wrong during the ingestion process. This can help identify if there are any errors or warnings related to profiling views.
Verify Profiling Configuration: Double-check your profiling configuration to ensure it is correctly set up. For instance, make sure profiling.enabled is set to true and sample_size is appropriately configured.
Custom Modifications: As mentioned earlier, profiling for views might require custom modifications in the SnowflakeProfiler class. If you are comfortable with code changes, you can modify the ingestion framework to handle profiling for views specifically.
Here is a sample configuration snippet based on your provided configuration:
If these steps do not resolve the issue, you might want to reach out to the DataHub community on Slack for more specific guidance or check the <https://datahubproject.io/docs/0.13.1/components|DataHub documentation> for any updates or additional configuration options.
Would you like more detailed guidance on how to modify the SnowflakeProfiler class to enable profiling for views?
To enable profiling for Snowflake views and have the sample data appear in the DataHub UI, you will need to modify the SnowflakeProfiler class. Here are the steps to achieve this:
Locate the SnowflakeProfiler Class: The SnowflakeProfiler class is responsible for handling profiling in Snowflake. You can find it in the snowflake_profiler.py file.
Modify the Profiling Logic: Update the logic in the SnowflakeProfiler class to include views in the profiling process.
Rebuild and Deploy: After making the necessary changes, you will need to rebuild and deploy the ingestion framework.
Step-by-Step Guide
Step 1: Locate the SnowflakeProfiler Class
The SnowflakeProfiler class can be found in the following file:
Step 2: Modify the Profiling Logic
Edit the SnowflakeProfiler class to ensure it includes views in the profiling process. Here is an example modification:
class SnowflakeProfiler(GenericProfiler, SnowflakeCommonMixin):
def __init__(
self,
config: SnowflakeV2Config,
report: SnowflakeV2Report,
state_handler: Optional[ProfilingHandler] = None,
) -> None:
super().__init__(config, report, self.platform, state_handler)
self.config: SnowflakeV2Config = config
self.report: SnowflakeV2Report = report
self.logger = logger
self.database_default_schema: Dict[str, str] = dict()
def get_workunits(
self, database: SnowflakeDatabase, db_tables: Dict[str, List[SnowflakeTable]]
) -> Iterable[MetadataWorkUnit]:
if self.config.is_profiling_enabled():
self.config.options.setdefault(
"max_overflow", self.config.profiling.max_workers
)
if PUBLIC_SCHEMA not in db_tables:
self.database_default_schema[database.name] = list(db_tables.keys())[0]
profile_requests = []
for schema in database.schemas:
for table in db_tables[schema.name]:
# Include views in profiling
if (
not self.config.profiling.profile_external_tables
and table.type == "EXTERNAL TABLE"
):
logger.info(
f"Skipping profiling of external table {database.name}.{schema.name}.{table.name}"
)
self.report.profiling_skipped_other[schema.name] += 1
continue
# Add condition to include views
if table.type == "VIEW" or table.type == "BASE TABLE":
profile_request = self.get_profile_request(
table, schema.name, database.name
)
if profile_request is not None:
self.report.report_entity_profiled(profile_request.pretty_name)
profile_requests.append(profile_request)
if len(profile_requests) == 0:
return
yield from self.generate_profile_workunits(
profile_requests,
max_workers=self.config.profiling.max_workers,
db_name=database.name,
platform=self.platform,
profiler_args=self.get_profile_args(),
)
Step 3: Rebuild and Deploy
After making the changes, you will need to rebuild the ingestion framework and deploy it. Follow the standard build and deployment procedures for your environment.
Additional Configuration
Ensure your configuration file includes the necessary parameters to enable profiling for views:
To configure profiling for Snowflake views without modifying the code, you need to ensure that your ingestion configuration is correctly set up. Based on the provided data sources, here are the steps you should follow:
Ensure Profiling is Enabled: Make sure that profiling is enabled in your configuration.
Include Views in Profiling: Ensure that views are included in the profiling process.
Set Sample Size: Define the sample size for profiling.
Verify Permissions: Ensure that the Snowflake user has the necessary permissions to profile views.
Configuration Steps
Ensure Profiling is Enabled:
Your configuration should have the profiling.enabled set to true and sample_size defined.
Include Views in Profiling:
Make sure the include_views parameter is set to true in your configuration.
Set Sample Size:
Define the sample size for the profiling process.
Verify Permissions:
Ensure that the Snowflake user has the necessary permissions to access and profile views.
Here is an example configuration based on your requirements:
By following these steps, you should be able to configure profiling for Snowflake views and have the sample data appear in the DataHub UI without modifying the code.
Would you like more detailed steps or have any specific questions about this process?
Sources:
- [Slack Thread on Profiling Views](<https://datahubspace.slack.com/archives/CUMUWQU66/p1708696303.163829>)
- [Using Kubernetes for Scheduling Ingestion](<https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/schedule_docs/kubernetes.md>)
- [Snowflake Configuration Guide](<https://github.com/datahub-project/datahub/blob/master/docs/quick-ingestion-guides/snowflake/configuration.md>) :+1: 0 button :-1: 0 button :bulb: *Hint:* Mention @runllm in the thread for followups.
Based on the provided data sources, it appears that profiling and ingestion of sample data for SQL views in Snowflake might not be fully supported by default in DataHub. Profiling and sample data ingestion are generally more straightforward for tables.
Profiling SQL Views
From the Slack conversations:
Profiling for Views: There are indications that profiling for views might not be fully supported out-of-the-box. For example, in <Slack Slack thread>, a user mentions that profiling for views is not available by default and requires custom profiling.
Configuration for Profiling Tables and Views
While the configuration you provided is correct for enabling profiling, it is primarily designed for tables. Here is a summary of the configuration for profiling:
Ensure Permissions: Make sure that the Snowflake user has the necessary permissions to access and profile views. This includes SELECT permissions on the views.
Check DataHub Version: Ensure you are using a version of DataHub that supports the features you need. Profiling for views might have specific requirements or bugs fixed in later versions.
Enable Debug Logging: Enable debug logging to get more detailed insights into what might be going wrong during the ingestion process. This can help identify if there are any errors or warnings related to profiling views.
Conclusion
As per the available data sources, it seems that profiling for views might not be fully supported by default in DataHub. If you need to profile views, you might need to implement custom profiling logic or reach out to the DataHub community for more specific guidance.
Would you like more detailed guidance on how to implement custom profiling for views, or do you have any other questions?