<@U06TM7M40RK>
Is it possible to write a query that extracts the schema and table/view name from the Dataset URN in the postgres database for me?
To extract the schema and table/view name from a Dataset URN in a PostgreSQL database using DataHub, you can utilize GraphQL queries. The URN format for a PostgreSQL dataset typically follows the pattern: urn:li:dataset:(urn:li:dataPlatform:postgres,database.schema.table,PROD)
.
Here’s a GraphQL query that can help you retrieve the schema and table/view name from a Dataset URN:
query getDatasetSchema($urn: String!) {
dataset(urn: $urn) {
schemaMetadata {
fields {
fieldPath
type
nativeDataType
}
}
}
}
In this query, replace $urn
with the actual URN of your dataset. This query will return the schema metadata, including the fields, which can help you identify the schema and table/view name.
If you need to extract the schema and table/view name directly from the URN string, you can parse the URN using a script or a function in your preferred programming language. The URN string can be split by commas and periods to extract the database, schema, and table/view names.
Would you like more detailed guidance on how to implement this in a specific programming language or environment?
Sources:
- <https://forum.datahubproject.io/t/extracting-database-schema-details-in-datahub-via-graphql/269|Extracting database schema details in DataHub via GraphQL>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> Yes, give me example of split URN in Kotlin
I’m not able to find any data sources to answer your question. You might want to specify the format of the URN you are working with. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>
I have one more question. I am doing tests (Expectations) using GX (Great expectation). How can I pass the test results to DataHub?
DataHub 0.14.1
GX 0.17.23
To pass test results from Great Expectations (GX) to DataHub, you can use the DataHubValidationAction
within your Great Expectations Checkpoint. This integration allows you to send assertions (expectations) and their results to DataHub using DataHub’s Python Rest emitter.
Here’s a step-by-step guide on how to set this up:
-
Install the Required Dependency: Ensure that you have the
acryl-datahub
package installed with the Great Expectations plugin. You can install it using pip:pip install 'acryl-datahub[great-expectations]'
-
Configure the DataHubValidationAction: Add the
DataHubValidationAction
to your Great Expectations Checkpoint configuration. This action will send the validation results to DataHub.Here’s an example configuration for a Checkpoint:
name: my_checkpoint config_version: 1 class_name: Checkpoint validations: - batch_request: datasource_name: my_datasource data_connector_name: my_data_connector data_asset_name: my_table expectation_suite_name: my_expectation_suite action_list: - name: store_validation_result action: class_name: StoreValidationResultAction - name: update_data_docs action: class_name: UpdateDataDocsAction - name: datahub_action action: module_name: datahub.integrations.great_expectations.action class_name: DataHubValidationAction server_url: http://localhost:8080 # Replace with your DataHub GMS endpoint token: YOUR_DATAHUB_TOKEN # Optional: If authentication is required
-
Run the Checkpoint: Execute the Checkpoint to run your expectations and send the results to DataHub.
from great_expectations.checkpoint import Checkpoint from great_expectations.data_context import DataContext context = DataContext("/path/to/your/great_expectations") checkpoint = context.get_checkpoint("my_checkpoint") checkpoint_result = checkpoint.run()
-
Verify in DataHub: After running the Checkpoint, you should be able to see the validation results in the DataHub UI under the “Validation” tab for the respective dataset.
Please note that the DataHubValidationAction
currently supports v3 API datasources using the SqlAlchemyExecutionEngine
. If you are using a different execution engine, such as Spark or Pandas, this integration may not work as expected.
Would you like more detailed guidance on any specific part of this process?
Sources:
- https://datahubproject.io/docs/0.13.1/managed-datahub/observe/data-contract#ui|https://datahubproject.io/docs/0.13.1/managed-datahub/observe/data-contract#ui
- https://datahubproject.io/docs/0.14.0/metadata-ingestion/integration_docs/great-expectations|https://datahubproject.io/docs/0.14.0/metadata-ingestion/integration_docs/great-expectations
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>
Can it be done in a different way? Without using DataHubValidationAction
It seems likely that integrating Great Expectations (GX) with DataHub without using DataHubValidationAction
would require a custom approach. The DataHubValidationAction
is specifically designed to facilitate the integration between GX and DataHub by sending validation results directly to DataHub’s metadata store. However, if you want to explore alternative methods, here are a few possibilities:
-
Custom Emitter: You could create a custom script or tool that extracts validation results from GX and formats them into DataHub’s metadata model. This would involve using DataHub’s REST API to manually send the validation results as metadata change proposals.
-
Use of DataHub’s Open API: DataHub provides an open API that you can use to ingest metadata. You could write a script that takes the output of your GX validations and sends it to DataHub using this API. This would require understanding the API endpoints and the expected data format.
-
Intermediate Storage: Store the GX validation results in an intermediate storage system (e.g., a database or a file system) and then use a separate process to read from this storage and send the data to DataHub.
-
Direct Database Integration: If your GX results are stored in a database, you could write a custom integration that reads from this database and sends the data to DataHub.
These approaches would require a good understanding of both GX’s output format and DataHub’s metadata model. Additionally, you would need to handle authentication and error handling when interacting with DataHub’s API.
Would you like more detailed guidance on any of these alternative methods?
Sources:
- https://datahubspace.slack.com/archives/CUMUWQU66/p1676438971.453759|https://datahubspace.slack.com/archives/CUMUWQU66/p1676438971.453759
- https://datahubspace.slack.com/archives/CUMUWQU66/p1697129337.629939|https://datahubspace.slack.com/archives/CUMUWQU66/p1697129337.629939
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><