my classification for snowflake is not working on version 0.13.1.2 . I tried checking the logs in debug mode but nothing appears except num_tables_classification_failed’: 11 . How can I verify further what the problem is
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
<@U06TM7M40RK> Can you help with the questions in this thread?
_Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To troubleshoot the issue with classification for Snowflake in DataHub version 0.13.1.2, you can follow these steps:
-
Verify Configuration: Ensure that your classification configuration in the ingestion recipe is correctly set up. The configuration should look something like this:
source: type: snowflake config: account_id: <your-account-id> username: <your-username> password: <your-password> role: <your-role> warehouse: <your-warehouse> include_tables: true include_views: true profiling: enabled: true classification: enabled: true classifiers: - type: datahub config: confidence_level_threshold: 0.7
Make sure that the
classification
andclassifiers
sections are correctly nested under thesource
configuration. -
Check Privileges: Ensure that the Snowflake user has the necessary privileges. According to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/snowflake/snowflake_pre.md|Snowflake Configuration in DataHub>, the user needs
SELECT
privileges on the tables for classification to work.grant select on all tables in database "<your-database>" to role datahub_role; grant select on future tables in database "<your-database>" to role datahub_role;
-
Inspect Debug Logs: Run the ingestion with debug logs enabled to get more detailed information. You can enable debug logs by running the ingestion command with the
--debug
flag:datahub --debug ingest -c <your-recipe-file>.yaml
This will generate detailed logs that can help identify the issue. Look for any specific error messages related to classification.
-
Check for Null Values: Classification might fail if there are too many null values in the columns. As mentioned in <Slack Discussion>, ensure that the columns have sufficient non-null values for classification.
-
Update Classification Library: Ensure that you are using the latest version of the classification library. You can manually update it if necessary:
pip install acryl-datahub-classify==0.0.6
-
Verify Network Connectivity: If you encounter SSL certificate errors, ensure that your environment can access external resources required by the classification library. You can test this by running a curl command from within the container:
docker exec -it datahub-actions bash curl https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json
Would you like more detailed steps or further clarification on any specific part of the process?
Sources:
- <Slack Discussion>
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/snowflake/snowflake_pre.md|Snowflake Configuration in DataHub> 0 button 0 button Hint: Mention @runllm in the thread for followups.
Can you share your logs with us?