Boosting Entities in Search Results with DataHub's Custom Search Configuration

Original Slack Thread

Hi all, is it possible to boost some entities in the search result?
e.g. for “active user” keyword, I would like to bump “active user A” instead of “active user B” because that’s what user usually need.

<@U06TM7M40RK> are you able to assist here?

Yes, I can assist with that. To boost certain entities in search results within DataHub, you can utilize the search customization feature provided by DataHub. This involves modifying the search configuration to influence the ranking of search results based on specific criteria.

Steps to Boost Entities in Search Results

  1. Enable Custom Search Configuration:
    Ensure that custom search configuration is enabled in your DataHub instance. You can do this by setting the following environment variables in your GMS configuration:

    ELASTICSEARCH_QUERY_CUSTOM_CONFIG_ENABLED: true
    ELASTICSEARCH_QUERY_CUSTOM_CONFIG_FILE: search_config.yml
    
  2. Create a Custom Search Configuration File:
    Create a search_config.yml file where you can define the custom search logic. Here is an example configuration that boosts entities with specific criteria:

    queryConfigurations:
      - queryRegex: .*
        simpleQuery: true
        prefixMatchQuery: true
        exactMatchQuery: true
        functionScore:
          functions:
            - filter:
                terms:
                  name.keyword:
                    - "active user A"
              weight: 3.0
          score_mode: multiply
          boost_mode: multiply
    
  3. Deploy the Configuration:
    Place the search_config.yml file in the location specified by the ELASTICSEARCH_QUERY_CUSTOM_CONFIG_FILE environment variable. If using Helm, you can provide this configuration via the values file overrides.

  4. Restart DataHub Services:
    After deploying the configuration file, restart the DataHub services to apply the new search configuration.

Example Configuration

Here is a more detailed example configuration to boost “active user A” over “active user B”:

queryConfigurations:
  - queryRegex: .*
    simpleQuery: true
    prefixMatchQuery: true
    exactMatchQuery: true
    functionScore:
      functions:
        - filter:
            terms:
              name.keyword:
                - "active user A"
          weight: 3.0
        - filter:
            terms:
              name.keyword:
                - "active user B"
          weight: 1.0
      score_mode: multiply
      boost_mode: multiply

This configuration will apply a higher weight to “active user A”, thus boosting its ranking in the search results compared to “active user B”.

Would you like more detailed guidance on any of these steps?

Sources: