Troubleshooting Elasticsearch Configuration for String Properties

user-2 · February 3, 2025, 12:04am

The data search functionality in DataHub, leveraging the /api/graphql API, has a limitation where string properties cannot exceed 256 characters. For instance, when a file name surpasses this limit (e.g., 261 characters), the API fails to retrieve results.However, if the character count is equal to or less than 256, the API retrieves results successfully. Is there a way to increase the character limit for string properties in a field through configuration adjustments or with elastic search helm chart?
cc <@U06TM7M40RK> <@U02TYQ4SPPD> <@U06CMSR2S1E> <@U03MF8MU5P0>

user-1 · February 3, 2025, 12:04am

I’m not able to find any data sources to answer your question. You might want to provide details about your current Elasticsearch configuration or any specific error messages you’re encountering. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-2 · February 3, 2025, 12:04am

<@U06TM7M40RK> is this a valid configuration for elasticsearch.yml?
esConfig:
elasticsearch.yml: |
index:
analysis:
analyzer:
default:
type: standard
mapping:
ignore_above: 1024

user-1 · February 3, 2025, 12:04am

The configuration snippet you provided for elasticsearch.yml is not entirely valid as it stands. Let’s break it down:

Analyzer Configuration:
- The analysis section is correctly placed under index. However, the analyzer configuration should specify the type and any additional settings for the analyzer. The default analyzer is typically the standard analyzer, which is a built-in analyzer in Elasticsearch.
Mapping Configuration:
- The ignore_above setting is used within a field mapping to specify that strings longer than a certain length should not be indexed. This setting should be placed under a specific field mapping, not directly under index.

Here’s a more structured example of how you might configure these settings in elasticsearch.yml:

index:
  analysis:
    analyzer:
      default:
        type: standard
  mappings:
    properties:
      your_field_name:
        type: text
        fields:
          keyword:
            type: keyword
            ignore_above: 1024

In this example:

The default analyzer is set to standard.
The ignore_above setting is applied to a specific field (your_field_name) within the mappings section. This ensures that only the keyword subfield of your_field_name will ignore strings longer than 1024 characters.

If you need more detailed guidance on configuring Elasticsearch, please let me know!

Sources:

https://datahubspace.slack.com/archives/C029A3M079U/p1725338147.048359|https://datahubspace.slack.com/archives/C029A3M079U/p1725338147.048359
https://datahubspace.slack.com/archives/C029A3M079U/p1725213163.170349|https://datahubspace.slack.com/archives/C029A3M079U/p1725213163.170349 0 button 0 button Hint: Mention @runllm in the thread for followups.

datahub_team · February 3, 2025, 12:04am

Hi ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look

Hope this helps ! ><

user-1 · February 3, 2025, 12:04am

The configuration you provided should be placed within the elasticsearch.yml file under the appropriate section for index settings. However, it’s important to note that elasticsearch.yml is typically used for node-level settings, and index-specific settings like mappings and analysis are usually defined when creating an index or updating its settings via the Elasticsearch API.

Here’s how you can structure your elasticsearch.yml if you want to include these settings for a specific index:

Node-Level Settings: elasticsearch.yml is primarily for node-level configurations such as cluster settings, node roles, and network configurations.
Index-Level Settings: For index-specific settings like mappings and analysis, you should use the Elasticsearch API to create or update an index.

Here’s an example of how you might use the Elasticsearch API to apply these settings:

PUT /your_index_name
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "standard"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "your_field_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 1024
          }
        }
      }
    }
  }
}

Steps to Apply the Configuration:

Create or Update Index: Use the above JSON configuration with the Elasticsearch API to create a new index or update an existing one.
Use Kibana or cURL: You can apply this configuration using Kibana’s Dev Tools or a cURL command.

If you need more detailed steps on how to apply these settings using the Elasticsearch API, please let me know!

Sources:

<Slack Slack Discussion>
<Slack Slack Discussion> 0 button 0 button Hint: Mention @runllm in the thread for followups.

Topic		Replies	Views
Making `ignore_above` Property Configurable in `/api/graphql` Search API ingestion	16	2	April 28, 2025
Troubleshooting ELASTICSEARCH_MAIN_TOKENIZER Setting for Korean Searching troubleshoot	4	41	March 4, 2024
Creating Field-to-Field Lineage from MySQL Tables to Elasticsearch Indices using DataHub API ingestion	19	94	May 5, 2025
Customizing Search with Elasticsearch Query Configuration troubleshoot	4	56	March 4, 2024
Troubleshooting Slow Elasticsearch Setup with UI Errors troubleshoot	5	62	May 6, 2024

Troubleshooting Elasticsearch Configuration for String Properties

Related topics