How to Retrieve Database Table Column Values from Datahub Using GraphQL Query?

Original Slack Thread

how to get database table column values from datahub using graphql query or another way?

hi <@U06D8QGKA5N>

To retrieve database table column values from Datahub you can use the GraphQL query language to interact with the Datahub API.
Datahub provides a GraphQL API that allows you to query and retrieve metadata about datasets and their columns.
Here’s a general example of how you might construct a GraphQL query to get information about a specific dataset and its columns:

query {
  dataset(urn: "your_dataset_urn") {
    name
    description
    schema {
      columns {
        name
        type
        description
      }
    }
    upstreamLineage {
      upstreams {
        dataset {
          name
          urn
        }
      }
    }
    downstreamLineage {
      downstreams {
        dataset {
          name
          urn
        }
      }
    }
  }
}```
Replace `"your_dataset_urn"` with the actual URN of the dataset you are interested in. This query fetches information about the dataset, its columns, and its upstream and downstream lineage.

Remember to check the Datahub documentation for your specific version to ensure you are using the correct syntax and field names. Additionally, make sure that your Datahub instance is properly configured to expose the GraphQL API.

Here are some key points from the query:

- `dataset`: Fetches information about a specific dataset.
- `name` and `description`: Retrieve basic information about the dataset.
- `schema`: Fetches information about the dataset's schema, including its columns.
- `upstreamLineage` and `downstreamLineage`: Fetch information about datasets that are upstream or downstream from the current dataset.

Or you can use Datahub SDK for Python, the script might look something like this:
```from datahub import DatahubClient

# Replace these with your actual Datahub instance and dataset URN
datahub_base_url = "<https://your-datahub-instance.com>"
dataset_urn = "urn:li:dataset:your_dataset_urn"

# Create a Datahub client
datahub_client = DatahubClient(base_url=datahub_base_url)

# Get dataset metadata
dataset_metadata = datahub_client.get_dataset_by_urn(dataset_urn)

# Extract column information
columns = dataset_metadata.schema.columns

# Now 'columns' contains information about each column in the dataset
for column in columns:
    column_name = column.name
    column_type = column.type
    print(f"Column: {column_name}, Type: {column_type}")```

Thanks for your response avani, But I want to get data point values not the column details. for example if mysql db has a table name as fruits and that column data point values are apple, banana and orange so I want all these three values.

Unfortunately, DataHub itself does not directly provide a way to read the raw values from a dataset, Its more over focus on metadata schema and fields types, description etc.
https://datahubproject.io/docs/generated/metamodel/entities/dataset/

Thanks

can we not get some values as sample values also?

I am not sure about this, hey <@U01GZEETMEZ> Do you have any idea on this ?

If you have profiling enabled https://datahubproject.io/docs/next/metadata-ingestion/docs/dev_guides/sql_profiles/, then we will can show sample values in the UI

Otherwise, datahub does not currently allow for querying data directly from the db