Hi team , we’ve been using datahub in our prod environment for quite some time now for data lineage, cataloging etc and was wondering what are some of the ways to publish test results (metadata from dq checks ) into validation tab in the datahub dataset?
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
<@U06TM7M40RK> Can you help with the questions in this thread?
We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a
or
so other folks in the Community know if you’re still looking for help on the topic!
To publish data quality check results (metadata from DQ checks) into the validation tab of a DataHub dataset, you can use the DataHub Python SDK or REST API. Here are some steps and examples to guide you through the process:
- Create Assertion Info: This involves creating an instance of
AssertionInfo
and sending it to DataHub. - Create Assertion Platform Mapping: This involves mapping the assertion to a specific data platform.
- Publish Assertion Results: This involves publishing the results of the assertions.
Here is an example using the Python SDK:
Assertion Info
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper
from datahub.metadata.com.linkedin.pegasus2avro.assertion import AssertionInfo
assertion_info_mcp = MetadataChangeProposalWrapper(
entityUrn="urn:li:assertion:<assertion-id>",
aspect=AssertionInfo(
type="COLUMN",
customProperties={"key": "value"},
# Add more fields as required
),
)
emitter.emit_mcp(mcp=assertion_info_mcp)
Assertion Platform Mapping
from datahub.metadata.schema_classes import DataPlatformInstanceClass
assertion_platform_mcp = MetadataChangeProposalWrapper(
entityUrn="urn:li:assertion:<assertion-id>",
aspect=DataPlatformInstanceClass(platform="urn:li:dataPlatform:<platform-name>"),
)
emitter.emit_mcp(mcp=assertion_platform_mcp)
Assertion Results
from datahub.metadata.com.linkedin.pegasus2avro.assertion import AssertionRunEvent, AssertionResult, AssertionResultType, AssertionRunStatus
import time
from datetime import datetime
assertion_result_mcp = MetadataChangeProposalWrapper(
entityUrn="urn:li:assertion:<assertion-id>",
aspect=AssertionRunEvent(
timestampMillis=int(round(time.time() * 1000)),
assertionUrn="urn:li:assertion:<assertion-id>",
asserteeUrn="urn:li:dataset:<dataset-id>",
runId=datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ"),
result=AssertionResult(
type=AssertionResultType.SUCCESS, # or AssertionResultType.FAILURE
nativeResults={"observed_value": "some_value"},
),
status=AssertionRunStatus.COMPLETE,
),
)
emitter.emit_mcp(mcp=assertion_result_mcp)
This example demonstrates how to create and publish assertions and their results to DataHub. The AssertionInfo
, DataPlatformInstanceClass
, and AssertionRunEvent
classes are used to construct the metadata change proposals.
Would you like more detailed information or examples on any specific part of this process?
Sources:
- <Slack Thread>
- <https://forum.datahubproject.io/t/pushing-data-quality-check-results-to-datahub-using-python-sdk-or-rest-api-through-examples/509|DataHub Forum>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
We are using datahub 0.13.2.4 , data source is snowflake and Ui ingestion
Can you please provide more details and examples of how to publish JSON results using above methods