Integrating Datahub with Great Expectations: Handling URN Errors

Original Slack Thread

Hey there! :hello-dog:
I have a doubt regarding Datahub integration with Great Expectations. We are using datahub 0.12.0 and GE version 0.15.50.
We created a datasource inside great_expectations.yml file like this:

    name: my_bigquery_datasource
    class_name: Datasource
    module_name: great_expectations.datasource
      class_name: SqlAlchemyExecutionEngine
      module_name: great_expectations.execution_engine
      connection_string: <bigquery://myproject/tmp>
        name: default_runtime_data_connector_name
        class_name: RuntimeDataConnector
        module_name: great_expectations.datasource.data_connector
          - default_identifier_name
        name: default_inferred_data_connector_name
        class_name: InferredAssetSqlDataConnector
        module_name: great_expectations.datasource.data_connector
        include_schema_name: true```
When I execute the command: `print(context.get_available_data_asset_names())`, I have noticed that all tables, from all datasets inside my GCP project appear. So, my intention is to use `tmp` as the dataset where GE will save the temporary tables it generates. So far so good. However, I want to send the validations to Datahub, and then the problem begins. We configure the checkpoint as follows (the code below is the main part of the file):
  - name: store_validation_result
      class_name: StoreValidationResultAction
  - name: store_evaluation_params
      class_name: StoreEvaluationParametersAction
  - name: update_data_docs
      class_name: UpdateDataDocsAction
  - name: datahub_action
      module_name: datahub.integrations.great_expectations.action
      class_name: DataHubValidationAction
      server_url: <http://datahub-gms:8080>
  - batch_request:
      datasource_name: my_bigquery_datasource
      data_connector_name: default_inferred_data_connector_name
      data_asset_name: mydataset.mytable
        index: -1
    expectation_suite_name: myproject.mydataset.mytable```
This way gives error to send to Datahub, because the urn includes `tmp` dataset, instead of `mydataset`, although I am explicitly setting `mydataset` in the `data_asset_name`. So it seems that Datahub is building the URN based on the connection string defined in `great_expectations` file. Because when I changed the dataset in the datasource, it worked.

Is there a way to force the URN to use the `data_asset_name` instead of the connection string? Or maybe configure another parameter inside checkpoint so we can set the desired dataset?
Because we have several datasets inside the GCP project and the performance will be really bad if we have one datasource defined per dataset. Besides, it won't make sense to define many datasources for the same project with different datasets, since each one of those will retrieve all data assets anyway.

Thanks in advance!

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)