<@U06TM7M40RK> I have datahub setup and I have configure great expectation assertions. Assertions are working fine locally.
I want to push them to datahub. I want to use great_expectations and acryl-datahub plugins to push assertions to datahub. Can you give me what all steps will be required.
To push Great Expectations assertions to DataHub using the acryl-datahub
plugin, you can follow these steps:
-
Install the Required Plugin: Ensure that you have the
acryl-datahub-gx-plugin
installed in your Great Expectations environment. You can do this by running the following command:pip install 'acryl-datahub-gx-plugin'
-
Configure DataHubValidationAction: You need to add the
DataHubValidationAction
to your Great Expectations Checkpoint configuration. This action will send the assertions and their results to DataHub. Here is an example configuration:action_list: - name: datahub_action action: module_name: datahub_gx_plugin.action class_name: DataHubValidationAction server_url: http://localhost:8080 # Replace with your DataHub server URL
Configuration Options:
server_url
(required): The URL of the DataHub GMS endpoint.env
(optional): The environment to use in the namespace when constructing dataset URNs.exclude_dbname
(optional): Exclude the database name/catalog when constructing dataset URNs.platform_alias
(optional): Platform alias when constructing dataset URNs.platform_instance_map
(optional): Maps the GX ‘data source’ name to a platform instance on DataHub.graceful_exceptions
(optional): If set to true, most runtime errors in the lineage backend will be suppressed.token
(optional): Bearer token used for authentication.timeout_sec
(optional): Per-HTTP request timeout.retry_status_codes
(optional): Retry HTTP request on these status codes.retry_max_times
(optional): Maximum times to retry if HTTP request fails.extra_headers
(optional): Extra headers to be added to the DataHub request.parse_table_names_from_sql
(optional): Enable SQL parsing to try to parse the datasets being asserted.convert_urns_to_lowercase
(optional): Convert dataset URNs to lowercase.
-
Run the Checkpoint: After configuring the
DataHubValidationAction
, run your Great Expectations Checkpoint. This will execute the assertions and send the results to DataHub. -
Debugging: If you encounter issues, you can enable debug logging by setting the environment variable
DATAHUB_DEBUG
totrue
.
For more detailed information, you can refer to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/integration_docs/great-expectations.md|Great Expectations integration guide>.
Would you like more detailed guidance on any specific step?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/integration_docs/great-expectations.md|Great Expectations Integration Guide>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> I dont want to use datahub-gx-plugin. Can i keep using great_expectation or acryl-datahub. Just like you have suggested in below
Sources:
It seems likely that you can use Great Expectations with DataHub without relying on the datahub-gx-plugin
. Instead, you can use the acryl-datahub
package to push assertions to DataHub. Here are some steps you might consider:
-
Install Acryl-DataHub: Ensure that you have the
acryl-datahub
package installed. You can do this by running:pip install acryl-datahub
-
Configure Great Expectations: You can configure your Great Expectations setup to use the
DataHubValidationAction
directly in your Checkpoint configuration. This action will send the validation results to DataHub. Here is an example configuration:action_list: - name: datahub_action action: module_name: datahub.integrations.great_expectations.action class_name: DataHubValidationAction server_url: http://localhost:8080 # Replace with your DataHub server URL
-
Run the Checkpoint: Execute your Great Expectations Checkpoint. This will run the assertions and send the results to DataHub.
-
Troubleshooting: If you encounter issues, ensure that your DataHub server URL and any required authentication tokens are correctly configured.
It seems that the integration between Great Expectations and DataHub is supported, but it requires the use of the DataHubValidationAction
to send results to DataHub. This approach does not require the datahub-gx-plugin
.
Would you like more detailed guidance on any specific step?
Sources:
- https://datahubspace.slack.com/archives/C029A3M079U/p1698901771.114549|https://datahubspace.slack.com/archives/C029A3M079U/p1698901771.114549
- https://datahubspace.slack.com/archives/CUMUWQU66/p1725556193.606389|https://datahubspace.slack.com/archives/CUMUWQU66/p1725556193.606389
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> We tried above steps given by you but we are getting below error when we run the checkpoint.
No module named
datahub.integrations.great_expectations.action could be found in your plugins directory.
- Please verify your plugins directory is configured correctly.
- Please verify you have a module named
datahub.integrations.great_expectations.action in your plugins directory.
The error message indicates that the module datahub.integrations.great_expectations.action
is not found in your environment. This suggests that the necessary integration module is not installed or not correctly configured. Here are some steps to troubleshoot and resolve this issue:
-
Verify Installation: Ensure that the
acryl-datahub
package is installed correctly. You can verify this by running:pip list | grep acryl-datahub
If it’s not installed, you can install it using:
pip install acryl-datahub
-
Check the Module Path: Ensure that the module path
datahub.integrations.great_expectations.action
is correct. The module should be part of theacryl-datahub
package. If the path is incorrect, you may need to adjust your configuration. -
Reinstall the Package: Sometimes, reinstalling the package can resolve issues with missing modules. Try uninstalling and then reinstalling the
acryl-datahub
package:pip uninstall acryl-datahub pip install acryl-datahub
-
Verify Python Environment: Ensure that you are using the correct Python environment where
acryl-datahub
is installed. If you are using a virtual environment, make sure it is activated. -
Check for Known Issues: There might be known issues or updates related to the integration. You can check the <https://datahubproject.io/docs/0.13.1/metadata-ingestion/integration_docs/great-expectations|DataHub Great Expectations Integration Guide> for any updates or additional configuration steps.
If these steps do not resolve the issue, please provide more details about your environment and configuration, and I can assist further.
Sources:
- <https://datahubproject.io/docs/0.13.1/metadata-ingestion/integration_docs/great-expectations|DataHub Great Expectations Integration Guide>
- <https://forum.datahubproject.io/t/troubleshooting-missing-module-named-datahub-integrations-great-expectations-action-for-datahub-validation-action/940|Troubleshooting Missing Module>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><