The data search functionality in DataHub, leveraging the /api/graphql API, has a limitation where string properties cannot exceed 256 characters. For instance, when a file name surpasses this limit (e.g., 261 characters), the API fails to retrieve results.However, if the character count is equal to or less than 256, the API retrieves results successfully. Is there a way to increase the character limit for string properties in a field through configuration adjustments or with elastic search helm chart?
cc <@U06TM7M40RK> <@U02TYQ4SPPD> <@U06CMSR2S1E> <@U03MF8MU5P0>
I’m not able to find any data sources to answer your question. You might want to provide details about your current Elasticsearch configuration or any specific error messages you’re encountering. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> is this a valid configuration for elasticsearch.yml?
esConfig:
elasticsearch.yml: |
index:
analysis:
analyzer:
default:
type: standard
mapping:
ignore_above: 1024
The configuration snippet you provided for elasticsearch.yml
is not entirely valid as it stands. Let’s break it down:
-
Analyzer Configuration:
- The
analysis
section is correctly placed underindex
. However, theanalyzer
configuration should specify the type and any additional settings for the analyzer. Thedefault
analyzer is typically thestandard
analyzer, which is a built-in analyzer in Elasticsearch.
- The
-
Mapping Configuration:
- The
ignore_above
setting is used within a field mapping to specify that strings longer than a certain length should not be indexed. This setting should be placed under a specific field mapping, not directly underindex
.
- The
Here’s a more structured example of how you might configure these settings in elasticsearch.yml
:
index:
analysis:
analyzer:
default:
type: standard
mappings:
properties:
your_field_name:
type: text
fields:
keyword:
type: keyword
ignore_above: 1024
In this example:
- The
default
analyzer is set tostandard
. - The
ignore_above
setting is applied to a specific field (your_field_name
) within themappings
section. This ensures that only the keyword subfield ofyour_field_name
will ignore strings longer than 1024 characters.
If you need more detailed guidance on configuring Elasticsearch, please let me know!
Sources:
- https://datahubspace.slack.com/archives/C029A3M079U/p1725338147.048359|https://datahubspace.slack.com/archives/C029A3M079U/p1725338147.048359
- https://datahubspace.slack.com/archives/C029A3M079U/p1725213163.170349|https://datahubspace.slack.com/archives/C029A3M079U/p1725213163.170349
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><
The configuration you provided should be placed within the elasticsearch.yml
file under the appropriate section for index settings. However, it’s important to note that elasticsearch.yml
is typically used for node-level settings, and index-specific settings like mappings and analysis are usually defined when creating an index or updating its settings via the Elasticsearch API.
Here’s how you can structure your elasticsearch.yml
if you want to include these settings for a specific index:
-
Node-Level Settings:
elasticsearch.yml
is primarily for node-level configurations such as cluster settings, node roles, and network configurations. -
Index-Level Settings: For index-specific settings like mappings and analysis, you should use the Elasticsearch API to create or update an index.
Here’s an example of how you might use the Elasticsearch API to apply these settings:
PUT /your_index_name
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "standard"
}
}
}
},
"mappings": {
"properties": {
"your_field_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 1024
}
}
}
}
}
}
Steps to Apply the Configuration:
-
Create or Update Index: Use the above JSON configuration with the Elasticsearch API to create a new index or update an existing one.
-
Use Kibana or cURL: You can apply this configuration using Kibana’s Dev Tools or a cURL command.
If you need more detailed steps on how to apply these settings using the Elasticsearch API, please let me know!
Sources: