Troubleshooting SQL Query Metadata Ingestion in DataHub with Tagging RunLLM

Original Slack Thread

Hi,
I trying to scan the metadata in datahub ui for sql-queries and while I am giving the query as {“query”:“SELECT* FROM sys.schemaName”} it is throwing this error

{
"title": "No metadata was produced by the source",
"message": "Please check the source configuration,filters, and permissions.",
"context" []
}```
so do we need to provide database schema name or each table name in schema to produce metadata?
and also in the query file format in the optional fields the timestamp,upstream,downstream what do we need to give like the timestamp should be present timestamp? and what data we need to give for upstream and downstream?

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To address your issue with the SQL query ingestion in DataHub, let’s break down the requirements and the error message you’re encountering.

Error Message
The error message "No metadata was produced by the source" suggests that the ingestion process did not generate any metadata. This could be due to several reasons, such as incorrect query format, missing configuration, or lack of permissions.

Query File Format
According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/sql-queries|DataHub documentation>, the query file should be a newline-delimited JSON file where each line is a separate JSON object. Here’s a breakdown of the fields:

  • query: The SQL query to parse. Ensure that your query is correctly formatted and valid for the SQL dialect you are using.
  • timestamp (optional): The timestamp of the query, in seconds since the epoch. This can be the current timestamp or the time when the query was executed.
  • user (optional): The user who ran the query. This should be a string representing the user.
  • operation_type (optional): Platform-specific operation type, used if the operation type can’t be parsed.
  • downstream_tables (optional): A list of tables that the query writes to. This is used if the query can’t be parsed to determine the downstream tables.
  • upstream_tables (optional): A list of tables the query reads from. This is used if the query can’t be parsed to determine the upstream tables.

Example Query File
Here is an example of how your query file should look:

{"query": "SELECT * FROM sys.schemaName", "timestamp": 1689232738, "user": "user_a", "upstream_tables": ["your_database.your_schema.your_table"]}

Troubleshooting Steps

  1. Check Query Syntax: Ensure that your SQL query is valid and correctly formatted for the SQL dialect you are using.
  2. Provide Schema and Table Names: If the query involves specific tables, ensure that the upstream_tables and downstream_tables fields are correctly populated with the full table names, including the schema.
  3. Verify Configuration: Double-check your ingestion configuration to ensure that it is correctly set up. This includes the platform, database, and any other relevant settings.
  4. Permissions: Ensure that the user running the ingestion has the necessary permissions to access the metadata.

Would you like more detailed steps on any specific part of this process?

Sources:

Hi, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><