Hi,
I am trying to do a sql-query ingestion. It says: “Pipeline finished successfully; produced 3 events in 0.55 seconds.”
But I can’t see anything in Datahub. I am using v0.13.2
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
Are you using UI or CLI for ingestion?
Which DataHub version are you using? (e.g. 0.12.0)
What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
To help you more effectively, could you please provide the following information?
Are you using UI or CLI for ingestion?
Which DataHub version are you using? (e.g., 0.12.0)
What data source(s) are you integrating with DataHub? (e.g., BigQuery)
Based on the information provided and the data sources available, here are some potential troubleshooting steps:
Verify Ingestion Configuration: Ensure that your ingestion configuration is correctly set up. Here is an example configuration for SQL queries ingestion:
Check Elasticsearch Indexing: Sometimes, data might be ingested but not indexed correctly in Elasticsearch, which is used by DataHub for search and browse functionalities. Verify the health of your Elasticsearch cluster and ensure that the indices are being updated correctly.
Inspect Ingestion Logs: Review the ingestion logs for any warnings or errors that might indicate why the data is not appearing in the UI. Sometimes, partial ingestion or issues with specific records can cause such problems.
Ensure Correct Configuration: Double-check the configuration used for the ingestion to ensure that all necessary parameters are correctly set. Incorrect configurations can lead to successful ingestion without the data being properly displayed in the UI.
Would you like more detailed steps on any of these troubleshooting points or further assistance on a specific part of the process?
Sure, let’s dive deeper into the steps to troubleshoot your issue with SQL queries ingestion in DataHub.
Detailed Steps to Troubleshoot SQL Queries Ingestion
Verify Ingestion Configuration:
Ensure that your ingestion configuration YAML file is correctly set up. Here is a more detailed example configuration for SQL queries ingestion:
Check Elasticsearch Indexing:
DataHub relies on Elasticsearch for search and browse functionalities. If the ingestion pipeline logs indicate that events were produced but you can’t see them in the UI, it could be an indexing issue. Verify the health of your Elasticsearch cluster and check if the indices are being updated.
Inspect Ingestion Logs:
Review the logs generated during the ingestion process. Run the ingestion command in debug mode to get more detailed logs:
datahub ingest -c ingest_queries.yml --debug
Look for any warnings or errors that might indicate why the data is not appearing in the UI.
Ensure Correct Configuration:
Double-check your configuration file to ensure all necessary parameters are correctly set. Here is a sample JSON file and YAML configuration for reference:
Check DataHub UI:
After ensuring that the ingestion pipeline runs successfully, check the DataHub UI for the ingested metadata. Sometimes, a simple refresh or clearing the browser cache can help in displaying the latest ingested data.
Review Elasticsearch Logs:
If the data is still not visible, review the Elasticsearch logs to ensure that the data is being indexed correctly. You might need to reindex the data if there are issues with the current indices.
Would you like more detailed guidance on any specific step or further assistance with a particular part of the process?