<@U06TM7M40RK>
Im using datahub to ingest metadata from databricks. I have many queries to analyse the data, some of them are in wrong format. But datahub is using those to analyse the usage of tables. I got this error
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/11/15 07:07:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable```
How can I configure it as a warning or completely ignore it. This is blocking me to have a successful ingestion
To address the issue of SQL parsing errors during metadata ingestion from Databricks in DataHub, you can configure the system to treat these errors as warnings instead of failures. This will prevent the ingestion process from being blocked by such errors.
Adjust Logging Level: You can set the default log level to “WARN” to reduce the verbosity of logs and focus on warnings. This can be done by configuring the logging settings in your environment or script. For Spark, you can use the following command to set the log level:
sc.setLogLevel("WARN")
Ignore Specific Errors: If you want to completely ignore specific parsing errors, you might need to customize the ingestion logic or use a pre-processing step to filter out problematic queries before they are ingested.
Would you like more detailed steps on how to implement these changes, or do you need further assistance with specific configurations?
To configure DataHub to treat SQL parsing failures as warnings instead of errors, you can follow these steps:
Update DataHub Configuration: You need to modify the DataHub ingestion configuration to handle SQL parsing errors as warnings. This involves changing the error handling logic in the ingestion source code or configuration files.
Modify the Ingestion Source Code: If you have access to the source code, you can update the logic where SQL parsing errors are handled. For example, in the DataHub ingestion source code, you can change the error reporting from a failure to a warning. This is similar to the change made in the <https://github.com/datahub-project/datahub/pull/10924|DataHub GitHub pull request> where SQL parsing issues were reduced in severity from failures to warnings.
Use a Custom Docker Image: If you are deploying DataHub using Docker, you might need to create a custom Docker image with the updated source code. This involves building a Docker image with the necessary changes and using it in your deployment.
Deploy the Updated Configuration: Once you have made the necessary changes, deploy the updated configuration or Docker image to your DataHub environment.
Here is a general example of how you might modify the source code to report warnings instead of failures:
# Example code snippet to change error reporting
try:
# Attempt to parse SQL query
parse_sql_query(query)
except SQLLineageException as e:
# Log a warning instead of raising an error
logger.warning(f"SQL parsing warning: {e}")
Would you like more detailed guidance on any of these steps, or do you need help with a specific part of the process?
<@U06TM7M40RK> this will not work for me
2. Adjust Logging Level: You can set the default log level to “WARN” to reduce the verbosity of logs and focus on warnings. This can be done by configuring the logging settings in your environment or script. For Spark, you can use the following command to set the log level: sc.setLogLevel("WARN")
because the error is on Datahub UI, there’s no spark code from my side!
If the error is occurring on the DataHub UI and you don’t have direct access to Spark code, you can still manage how SQL parsing errors are handled by adjusting the DataHub ingestion configuration or environment settings. Here are some steps you can take:
Modify DataHub Ingestion Configuration: Check if there is a configuration option in your DataHub ingestion setup that allows you to specify how errors are handled. Some ingestion frameworks provide options to continue on error or to log errors as warnings.
Check DataHub Documentation: Review the <https://datahubproject.io/docs|DataHub documentation> to see if there are any specific settings or flags that can be used to adjust error handling behavior during ingestion. This might include options to ignore certain types of errors or to log them at a different severity level.
Custom Error Handling: If the standard configuration options do not provide the flexibility you need, consider implementing a custom error handling mechanism. This might involve writing a custom ingestion script or plugin that wraps the standard ingestion process and handles errors according to your requirements.
Consult DataHub Community: If you are unable to find a solution through configuration or documentation, consider reaching out to the DataHub community for support. The community may have encountered similar issues and could offer insights or workarounds.
Would you like more detailed guidance on any of these steps, or do you need help with a specific part of the process?
Modify DataHub Ingestion Configuration: Check if there is a configuration option in your DataHub ingestion setup that allows you to specify how errors are handled. Some ingestion frameworks provide options to continue on error or to log errors as warnings.
<@U06TM7M40RK> this is actually a limitation of datahub, why it collects draft query to get table statistics?? then it throw me an error. better to skip that error and proceed. my whole pipeline corrupted of this!
I’m not able to find any data sources to answer your question. You might want to provide details about the specific DataHub version and ingestion configuration you are using. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button 0 button Hint: Mention @runllm in the thread for followups.
I’m not able to find any data sources to answer your question. You might want to provide information on whether you have customized any ingestion scripts or if you are using the default configuration. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button 0 button Hint: Mention @runllm in the thread for followups.
GitHub Issues: If you believe this is a bug or a feature request, you can open an issue on the <https://github.com/datahub-project/datahub/issues|DataHub GitHub repository>. Provide detailed information about your configuration, the error you are encountering, and any relevant logs.
Email Support: If you are using DataHub Cloud or have a support agreement, you might have direct email support available. Check any documentation or agreements you have for contact details.
Documentation: Review the <https://datahubproject.io/docs|DataHub documentation> for any troubleshooting guides or configuration tips that might help resolve your issue.
Would you like more detailed guidance on any of these options, or do you need help with a specific part of the process?
Thank you for letting us know you need a person to take a look at this thread.
Either a community member or one of our team members will try to assist when they have a moment, responses unfortunately might not be immediate. We hope this helps speed things along