Hi Team
After upgrading CLI to v13.3 I don’t see below logs anymore. All I see is the job report
2024-06-20 07:11:57,326 INFO sqlalchemy.engine.Engine DESCRIBE FORMATTED acctlnkds.tcic_acct_pgm_sta_hist
[2024-06-20 07:11:57,326] INFO {sqlalchemy.engine.Engine:1863} - DESCRIBE FORMATTED acctlnkds.tcic_acct_pgm_sta_hist```
Are these logs disabled completely? Without these logs it is not possible to know what is the progress and the last Dataset ingested.
In my case there are millions of Datasets and if the ingestion fails due any reason( network issue, permission error, HiveException, SerDe error) I need to know till what schema it got completed and then re-run with the remaining schemas.
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
Are you using UI or CLI for ingestion?
Which DataHub version are you using? (e.g. 0.12.0)
What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
Hi! To better assist you, could you please provide the following information?
Are you using UI or CLI for ingestion?
Which DataHub version are you using? (e.g., 0.12.0)
What data source(s) are you integrating with DataHub? (e.g., BigQuery)
Based on your initial message, it seems you are using the CLI for ingestion and have upgraded to DataHub CLI version 0.13.3. You mentioned that you are not seeing detailed logs anymore and only see the job report. This change could be related to recent updates in the logging configuration or ingestion framework.
To address your concerns:
Logging Configuration: There might have been changes in the logging configuration with the new version. You can check the <https://github.com/datahub-project/datahub/blob/master/docs/cli.md|DataHub CLI documentation> to see if there are any updates or configurations you need to adjust to re-enable detailed logging.
Ingestion Progress: If detailed logs are crucial for your ingestion process, especially for tracking progress and handling errors, you might need to adjust the logging level or configuration in your ingestion scripts.
Would you like more detailed steps on how to check and adjust the logging configuration for DataHub CLI?
To re-enable detailed logging in DataHub CLI version 0.13.3 and ensure you can track the ingestion progress and handle errors effectively, follow these detailed steps:
Step 1: Verify Logging Configuration
First, ensure that the logging configuration is set to capture detailed logs. The DataHub CLI uses the logback configuration for logging. You can adjust the logging level by modifying the logback.xml file.
Step 2: Apply Custom Logging Configuration
To apply a custom logging configuration in DataHub CLI, you can set the JAVA_OPTS environment variable to use your custom logback.xml file.
Step 3: Verify Ingestion Logs
After running the ingestion command, verify that the detailed logs are being captured as expected. The logs should include detailed information about the ingestion process, including the progress and any errors encountered.
Step 4: Handle Ingestion Failures
If ingestion fails due to network issues, permission errors, or other exceptions, review the detailed logs to identify the last successfully ingested dataset. You can then modify your ingestion recipe to resume from the last successful point.
To print the SQL queries being executed during ingestion, you can enable detailed logging for the SQLAlchemy engine, which is commonly used in DataHub for database interactions. This can be done by configuring the logging settings in your environment or ingestion script.
Step-by-Step Guide:
Modify Logging Configuration:
Ensure that the logging configuration captures detailed SQLAlchemy logs. You can do this by modifying the logback.xml file or by setting up logging in your Python script.
Set Up Logging in Python Script:
If you are running the ingestion via a Python script, you can configure logging directly in the script.
Set Up Logging in Shell Environment:
If you prefer to configure logging via the shell environment, you can set the SQLALCHEMY_ECHO environment variable to True.
Verify Logs:
After setting up the logging configuration, run your ingestion command. The SQL queries being executed should now be printed in the console output or log files.
I’m not able to find any data sources to answer your question.
Could you provide more details on your ingestion setup, such as the specific DataHub version, the database type you’re integrating with, and any custom configurations you might have?
If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button 0 button Hint: Mention @runllm in the thread for followups.
I’m not able to find any data sources to answer your question.
Could you specify which DataHub version you are using and the exact command you are running for ingestion?
If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button 1 button Hint: Mention <@U06TM7M40RK> in the thread for followups.
<@UV14447EU> <@U01GZEETMEZ>
Can you please help me with the detailed logging during ingestion?
I tried DATAHUB_DEBUG=true this is printing the query but also the cURL request to GMS. So can’t use this as the will print unnecessary data.
Is it possible to get the old way printing logs (query + cli report)?