Integrating Datahub with Great Expectations: Handling URN Errors

user-1 · March 4, 2024, 3:55pm

Hey there! :hello-dog:
I have a doubt regarding Datahub integration with Great Expectations. We are using datahub 0.12.0 and GE version 0.15.50.
We created a datasource inside great_expectations.yml file like this:

    name: my_bigquery_datasource
    class_name: Datasource
    module_name: great_expectations.datasource
    execution_engine:
      class_name: SqlAlchemyExecutionEngine
      module_name: great_expectations.execution_engine
      connection_string: <bigquery://myproject/tmp>
    data_connectors:
      default_runtime_data_connector_name:
        name: default_runtime_data_connector_name
        class_name: RuntimeDataConnector
        module_name: great_expectations.datasource.data_connector
        batch_identifiers:
          - default_identifier_name
      default_inferred_data_connector_name:
        name: default_inferred_data_connector_name
        class_name: InferredAssetSqlDataConnector
        module_name: great_expectations.datasource.data_connector
        include_schema_name: true```
When I execute the command: `print(context.get_available_data_asset_names())`, I have noticed that all tables, from all datasets inside my GCP project appear. So, my intention is to use `tmp` as the dataset where GE will save the temporary tables it generates. So far so good. However, I want to send the validations to Datahub, and then the problem begins. We configure the checkpoint as follows (the code below is the main part of the file):
```action_list:
  - name: store_validation_result
    action:
      class_name: StoreValidationResultAction
  - name: store_evaluation_params
    action:
      class_name: StoreEvaluationParametersAction
  - name: update_data_docs
    action:
      class_name: UpdateDataDocsAction
  - name: datahub_action
    action:
      module_name: datahub.integrations.great_expectations.action
      class_name: DataHubValidationAction
      server_url: <http://datahub-gms:8080>
validations:
  - batch_request:
      datasource_name: my_bigquery_datasource
      data_connector_name: default_inferred_data_connector_name
      data_asset_name: mydataset.mytable
      data_connector_query:
        index: -1
    expectation_suite_name: myproject.mydataset.mytable```
This way gives error to send to Datahub, because the urn includes `tmp` dataset, instead of `mydataset`, although I am explicitly setting `mydataset` in the `data_asset_name`. So it seems that Datahub is building the URN based on the connection string defined in `great_expectations` file. Because when I changed the dataset in the datasource, it worked.

Is there a way to force the URN to use the `data_asset_name` instead of the connection string? Or maybe configure another parameter inside checkpoint so we can set the desired dataset?
Because we have several datasets inside the GCP project and the performance will be really bad if we have one datasource defined per dataset. Besides, it won't make sense to define many datasources for the same project with different datasets, since each one of those will retrieve all data assets anyway.

Thanks in advance!

datahub_team · March 4, 2024, 3:56pm

Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!

Are you using UI or CLI for ingestion?
Which DataHub version are you using? (e.g. 0.12.0)
What data source(s) are you integrating with DataHub? (e.g. BigQuery)

Topic		Replies	Views
Troubleshooting Great Expectations Integration with Datahub - Grayed Out Validation Table ingestion	4	14	August 12, 2024
Integrating Great Expectations with DataHub: Issue Resolved by Changing URL in Checkpoint File getting-started	2	68	March 4, 2024
Steps to Push Great Expectations Assertions to DataHub Using Acryl-DataHub Plugin ingestion	6	32	January 13, 2025
Extracting Schema and Table/View Name from PostgreSQL Dataset URN and Passing Great Expectations Test Results to DataHub via DataHubValidationAction ingestion	8	29	February 10, 2025
Troubleshooting Great Expectations DataHub Integration and Dependency Versions troubleshoot	5	63	March 4, 2024

Integrating Datahub with Great Expectations: Handling URN Errors

Related topics