Retrieving Database Schemas Using GraphQL and OpenAPI

Original Slack Thread

Hey everyone - is there a chance to retrieve the schemas of all connected databases via GraphQL or OpenAPI or the PythonSDK? Either would work for me. Currently I have a Postgres DB connected with multiple databases. #graphql <#C03QF614JKW|openapi> #schema

Hello, you may have a chance to initiat an ingestion execution request via GraphQL. The example is as follows:
mutation {
createIngestExecutionRequest (
Input {
ingestionSourceUrn: "urn:li:dataHubIngestionSource:c33f55ce-e573-49ce-b171-3fb17bd906af",
}
)
}
If you want to extract all the databases that Postgrs connects to, you may need to make sure that all of them are located on the same host as configured in the Yaml file and that you have full access to them. here is the guide rearding how to config yaml for PG for your reference. <https://datahubproject.io/docs/generated/ingestion/sources/postgres|PG Yaml config>

Hey, I tried to run the suggested query but I get the following error:

{
"errors": [
{
"message": "Invalid Syntax : offending token '{' at line 3 column 23",
"locations": [
{
"line": 3,
"column": 23
}
],
"extensions": {
"classification": "InvalidSyntax"
}
}
],
"data": null,
"extensions": {}
}

I adapted the urn accordingly, but still I get an error.
I’m also not 100% sure if a mutation or the createIngestExecutionRequest is the correct mutation.
I’ll just explain again what I exactly need, maybe my description is misleading…

I have connected a postgresql database to datahub and I can see it in the browser. The database “actor”. has the following URN: urn:li:dataset(urn:li:dataPlatform:postgres,DatabaseNameToBeIngested.public.actor,PROD)

I now what to retrieve the schema of that database (ideally from all connected databases) via the GraphQL API. So in fact I would need a result like this.

"schemaMetadata": {
"name": "schemaMetadata",
"type": "VERSIONED",
"version": 0,
"value": {
"__type": "SchemaMetadata",
"schemaName": "DatabaseNameToBeIngested.public.actor",
"platform": "urn:li:dataPlatform:postgres",
"version": 0,
"created": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:unknown"
},
"hash": "",
"platformSchema": {
"__type": "MySqlDDL",
"tableSchema": ""
},
"fields": [
{
"fieldPath": "actor_id",
"nullable": false,
"type": {
"type": {
"__type": "NumberType"
}
},
"nativeDataType": "INTEGER()",
"recursive": false,
"isPartOfKey": true
},
{
"fieldPath": "first_name",
"nullable": false,
"type": {
"type": {
"__type": "StringType"
}
},
"nativeDataType": "VARCHAR(length=45)",
"recursive": false,
"isPartOfKey": false
},
{
"fieldPath": "last_name",
"nullable": false,
"type": {
"type": {
"__type": "StringType"
}
},
"nativeDataType": "VARCHAR(length=45)",
"recursive": false,
"isPartOfKey": false
},
{
"fieldPath": "last_update",
"nullable": false,
"type": {
"type": {
"__type": "TimeType"
}
},
"nativeDataType": "TIMESTAMP()",
"recursive": false,
"isPartOfKey": false
}
]
},
"created": {
"time": 1691307442000,
"actor": "urn:li:corpuser:__datahub_system"
}
}

I got this via the SwaggerUI using the *GET*<http://localhost:9002/openapi/swagger-ui/index.html#/Entities/getEntities|/entities/v1/latest> endpoint.

Di you have an idea how to get the schema via GraphQL?

I see. Try this: query dataset {
dataset (
urn:"urn:li:dataset(urn:li:dataPlatform:postgres,DatabaseNameToBeIngested.public.actor,PROD)"
) {
properties {
created,
createdActor
}
name,
lastIngested,
schemaMetadata{
name,
platformSchema{
__typename
},
hash,
version,
createdAt
fields{
fieldPath,
description,
nullable,
nativeDataType,
recursive,
isPartOfKey
}
},
platform {
properties{
type,
displayName
}
},
}
}

That works! Many thanks, really cool :star-struck:

Do you know if there is a GraphQL query which returns all URNs globally or at least for each connected database?

Probably you can try. searchAcrossEntites. here is an example. query searchAcrossEntities {searchAcrossEntities(input: {types: [DATASET], query: "*",orFilters:{and:{
field:"urn" values:"urn:li:dataPlatform:postgres" condition:CONTAIN}} searchFlags:{ skipCache:true}}) { searchResults {entity {urn type}}}} or query searchAcrossEntities {searchAcrossEntities(input: {types: [DATASET], query: "postgres", searchFlags:{ skipCache:true}}) {searchResults {entity {urn type}}}} Unlike the second query, the first query has an orFilters section that gives you more specific conditions.

The first query is exactly what I needed! Awesome, many thanks

Have you been active for longer time in the “DataHub” Cosmos? I have more a machine learning (social network graphs) background

Not much actually. I just like to watch this channel a lot…