Hi folks, I am running datahub v0.13.1, if I try to ingest the actual data into opensearch/elasticsearch, can we query the actual data in datahub? how? I saw query button under any dataset, can we run any query under the dataset?@
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic!
To address your question about querying actual data in DataHub:
- DataHub Version: You mentioned you are using DataHub v0.13.1.
- Ingestion Method: Are you using the UI or CLI for ingestion?
- Data Source: What data source(s) are you integrating with DataHub (e.g., BigQuery, MongoDB, etc.)?
Querying Actual Data in DataHub
DataHub primarily focuses on metadata management rather than storing and querying actual data. According to <About DataHub Search | DataHub documentation>, DataHub does not provide a means to query the actual data directly from the database. It is designed to catalog metadata, such as schema information, descriptions, and lineage.
Query Button Under Dataset
The query button you see under any dataset in the DataHub UI is typically used for metadata queries rather than querying the actual data. This button allows you to explore metadata aspects like schema, lineage, and other related information.
Profiling and Sample Values
If you have profiling enabled, DataHub can show sample values in the UI. Profiling helps in understanding the data distribution and quality but does not allow for querying the actual data directly. For more details on enabling profiling, you can refer to the <https://datahubproject.io/docs/next/metadata-ingestion/docs/dev_guides/sql_profiles/|profiling documentation>.
Example GraphQL Query for Metadata
Here is an example of a GraphQL query to fetch metadata about a dataset, including its schema and lineage:
query {
dataset(urn: "your_dataset_urn") {
name
description
schema {
columns {
name
type
description
}
}
upstreamLineage {
upstreams {
dataset {
name
urn
}
}
}
downstreamLineage {
downstreams {
dataset {
name
urn
}
}
}
}
}
Replace "your_dataset_urn"
with the actual URN of the dataset you are interested in.
Would you like more details on any specific aspect of querying metadata or setting up profiling?
Sources:
- <About DataHub Search | DataHub Search Documentation>
- <https://datahubproject.io/docs/next/metadata-ingestion/docs/dev_guides/sql_profiles/|Profiling Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.