<@U06TM7M40RK> when creating a data contract in a dataset via Datahub UI, what conditions should be met? I currently get the next message: Add a few assertions on this entity to create a data contract out of them.
To create a data contract in a dataset via the DataHub UI, you need to ensure that certain conditions are met, specifically related to assertions. According to the <https://datahubproject.io/docs/0.14.0/managed-datahub/observe/data-contract#ui|DataHub documentation>, the following steps and conditions are necessary:
-
Navigate to the Dataset Profile: Go to the dataset profile for which you want to create a data contract.
-
Validations Tab: Under the “Validations” tab, you should find the “Data Contracts” section.
-
Create Assertions: Before you can create a data contract, you need to have assertions in place. These assertions can be related to data freshness, schema, and data quality. The message you are seeing (“Add a few assertions on this entity to create a data contract out of them”) indicates that you need to define these assertions first.
-
Select Assertions: Once you have created the necessary assertions, you can select them to be included in the data contract. The assertions serve as the building blocks of the data contract, verifying that the contract’s conditions are met.
-
Create the Data Contract: After selecting the assertions, you can proceed to create the data contract.
If you are not seeing the option to create assertions or data contracts, ensure that your DataHub instance is properly configured and that you have the necessary permissions. Additionally, make sure you are using a version of DataHub that supports these features.
Would you like more detailed instructions on how to create assertions or any other aspect of this process?
Sources:
- <https://datahubproject.io/docs/0.14.0/managed-datahub/observe/data-contract#ui|DataHub Data Contracts UI Guide>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> I’m using Datahub UI, so I can’t create freshness, schema or data quality assertions from it (only available on Cloud version) neither ingesting from dbt, so do I have any option?
If you are using the open-source version of DataHub and cannot create freshness, schema, or data quality assertions directly from the UI, you still have a few options to work with assertions and data quality checks:
-
Use Third-Party Tools: You can integrate third-party data quality tools like Great Expectations or dbt tests to perform data quality checks. These tools can run assertions and then publish the results back to DataHub. This approach is commonly used when the native assertion features are not available in the open-source version.
-
Python SDK or REST API: You can use the DataHub Python SDK or REST API to programmatically create and manage assertions. This involves creating assertion objects and publishing them to DataHub. You can define custom assertions and report their results using these APIs.
-
Custom Integrations: If you have a custom data quality system, you can build an integration to publish the results to DataHub. This involves creating assertion objects and using the DataHub API to send the results.
Here is a basic example of how you might use the Python SDK to create and publish assertions:
from datahub.metadata.schema_classes import MetadataChangeProposalWrapper
from datahub.metadata.com.linkedin.pegasus2avro.assertion import AssertionInfo, AssertionRunEvent, AssertionResult, AssertionResultType, AssertionRunStatus
import time
from datetime import datetime
# Create Assertion Info
assertion_info_mcp = MetadataChangeProposalWrapper(
entityUrn="urn:li:assertion:<assertion-id>",
aspect=AssertionInfo(
type="COLUMN",
customProperties={"key": "value"},
# Add more fields as required
),
)
emitter.emit_mcp(mcp=assertion_info_mcp)
# Publish Assertion Results
assertion_result_mcp = MetadataChangeProposalWrapper(
entityUrn="urn:li:assertion:<assertion-id>",
aspect=AssertionRunEvent(
timestampMillis=int(round(time.time() * 1000)),
assertionUrn="urn:li:assertion:<assertion-id>",
asserteeUrn="urn:li:dataset:<dataset-id>",
runId=datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ"),
result=AssertionResult(
type=AssertionResultType.SUCCESS, # or AssertionResultType.FAILURE
nativeResults={"observed_value": "some_value"},
),
status=AssertionRunStatus.COMPLETE,
),
)
emitter.emit_mcp(mcp=assertion_result_mcp)
This example demonstrates how to create and publish assertions using the Python SDK. You can adapt this to fit your specific use case and data quality requirements.
Would you like more detailed information or examples on any specific part of this process?
Sources:
- <Slack Slack Thread>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> I’m trying to create a data contract via datahub ui, but I get the next message: failed to create data contract! an unexpected error occurred
When encountering the error message “failed to create data contract! an unexpected error occurred” while trying to create a data contract via the DataHub UI, there are several potential causes and steps you can take to troubleshoot and resolve the issue:
-
Check DataHub Version: Ensure that you are using a version of DataHub that supports data contracts. Some features might not be fully supported in older versions or the open-source version. According to <https://datahubproject.io/docs/0.14.0/managed-datahub/observe/data-contract|DataHub documentation>, data contracts are more robustly supported in the managed version of DataHub.
-
Review Logs: Check the logs of the DataHub services (frontend, GMS, etc.) for any error messages or stack traces that might provide more context about the failure. This can help identify if there is a specific issue with the configuration or a missing dependency.
-
Verify Assertions: Ensure that the necessary assertions (freshness, schema, data quality) are created and properly configured. The error might occur if the data contract is missing required assertions. As per the <https://datahubproject.io/docs/0.14.0/managed-datahub/observe/data-contract#ui|DataHub UI guide>, these assertions must be in place before creating a data contract.
-
Check Permissions: Make sure that the user account you are using has the necessary permissions to create data contracts. This includes having the “Edit Data Contract” privilege for the dataset in question.
-
Configuration Issues: Ensure that your DataHub instance is correctly configured to support data contracts. This might involve checking environment variables, configuration files, or ensuring that all necessary services are running.
-
Network and Connectivity: Verify that there are no network issues or connectivity problems between the DataHub UI and backend services.
If these steps do not resolve the issue, you may need to reach out to the DataHub community or support channels for further assistance. They might be able to provide more specific guidance based on the logs and error messages you are encountering.
Would you like more detailed instructions on any of these troubleshooting steps?
Sources:
- <https://datahubproject.io/docs/0.14.0/managed-datahub/observe/data-contract|DataHub documentation>
- <https://datahubproject.io/docs/0.14.0/managed-datahub/observe/data-contract#ui|DataHub UI guide>
0 button
0 button
Hint: Mention @runllm in the thread for followups.