Troubleshooting 0.12.1 Upgrade Job Stuck Running with Recurring Error

Original Slack Thread

Hi, I’m trying to run the 0.12.1 upgrade job but it seems to be stuck running with this recurring error in the logs:

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!
Options[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_i
ndices=true, ignore_aliases=false, ignore_throttled=true], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null
, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":1000,"query":{"bool":{"must":[{"function_score":{"query":{"bool":{
"should":[{"bool":{"should":[{"simple_query_string":{"query":"*","fields":["urn^10.0","displayName^1.0","description^1.0"],"analyzer":"keyword","flags":-1,"default_operator":"and","analyze_wildcard":fal
se,"auto_generate_synonyms_phrase_query":true,"fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"fuzzy_transpositions":true,"boost":1.0}},{"simple_query_string":{"query":"*","fields":["displayName.delim
ited^0.4","description.delimited^0.4","urn.delimited^5.0"],"analyzer":"query_word_delimited","flags":-1,"default_operator":"and","analyze_wildcard":false,"auto_generate_synonyms_phrase_query":true,"fuzz
y_prefix_length":0,"fuzzy_max_expansions":50,"fuzzy_transpositions":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},{"bool":{"should":[{"term":{"urn":{"value":"*","boost":100.0,"_name":"ur
n"}}},{"term":{"urn":{"value":"*","case_insensitive":true,"boost":70.0,"_name":"urn"}}},{"match_phrase_prefix":{"urn.delimited":{"query":"*","slop":0,"max_expansions":50,"zero_terms_query":"NONE","boost
":5.6,"_name":"urn"}}},{"term":{"displayName.keyword":{"value":"*","boost":10.0,"_name":"displayName"}}},{"term":{"displayName.keyword":{"value":"*","case_insensitive":true,"boost":7.0,"_name":"displayN
ame"}}},{"term":{"description.keyword":{"value":"*","boost":10.0,"_name":"description"}}},{"term":{"description.keyword":{"value":"*","case_insensitive":true,"boost":7.0,"_name":"description"}}},{"match
_phrase_prefix":{"description.delimited":{"query":"*","slop":0,"max_expansions":50,"zero_terms_query":"NONE","boost":0.448,"_name":"description"}}},{"match_phrase_prefix":{"displayName.delimited":{"quer
y":"*","slop":0,"max_expansions":50,"zero_terms_query":"NONE","boost":0.448,"_name":"displayName"}}}],"adjust_pure_negative":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"functions":[{"
filter":{"match_all":{"boost":1.0}},"weight":1.0}],"score_mode":"avg","boost_mode":"multiply","max_boost":3.4028235E38,"boost":1.0}}],"filter":[{"bool":{"must_not":[{"match":{"removed":{"query":true,"op
erator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}],"adjust_pure_negative"
:true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":["urn"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"urn":{"order":"asc"}}]}, cancelAfterTimeInterval=null, p
ipeline=null}```

if I try re-running the 0.12.1 elasticsearch setup job I see this in the log:

{"error":{"root_cause":[{"type":"illegal_state_exception","reason":"alias [datahub_usage_event] has more than one write index [datahub_usage_event-000001,datahub_usage_event-000392]"}],"type":"illegal_state_exception","reason":"alias [datahub_usage_event] has more than one write index [datahub_usage_event-000001,datahub_usage_event-000392]"},"status":500}```

and it is deployed in k8s, without helm charts

Hi Sufyan, are you able to share any more on the datahub_usage_event indices? Specifically datahub_usage_event-000001?

Hi, are there any commands I should run to get the additional info for those indices?

<@U05BW5BFX0R> Hi Ellie, I have faced the same issue. Here is my error message and datahub_usage_event indices.

2024/02/22 07:51:58 Received 200 from <https://vpc-datahub-uvsazv554daj62ce36m6na46x4.ap-northeast-2.es.amazonaws.com:443>
2024-02-22 16:51:58	going to use protocol: https
2024-02-22 16:51:58	going to use default elastic headers
2024-02-22 16:51:58	not using any prefix
2024-02-22 16:51:58	datahub_analytics_enabled: true
2024-02-22 16:51:58	>>> GET _opendistro/_ism/policies/datahub_usage_event_policy response code is 200
2024-02-22 16:51:58	>>> _opendistro/_ism/policies/datahub_usage_event_policy already exists ✓
2024-02-22 16:51:58	>>> GET _template/datahub_usage_event_index_template response code is 200
2024-02-22 16:51:58	>>> _template/datahub_usage_event_index_template already exists ✓
2024-02-22 16:51:59	

2024-02-22 16:51:59	>>> GET datahub_usage_event-000001 response code is 404
2024-02-22 16:51:59	>>> creating datahub_usage_event-000001 because it doesn't exist ...
2024-02-22 16:51:59	{
2024-02-22 16:51:59	  "aliases": {
2024-02-22 16:51:59	    "datahub_usage_event": {
2024-02-22 16:51:59	      "is_write_index": true
2024-02-22 16:51:59	    }
2024-02-22 16:51:59	  }
2024-02-22 16:51:59	}
2024-02-22 16:51:59	
2024/02/22 07:51:59 Command finished successfully.
2024-02-22 16:51:59	
{"error":{"root_cause":[{"type":"illegal_state_exception","reason":"alias [datahub_usage_event] has more than one write index [datahub_usage_event-000001,datahub_usage_event-000671]"}],"type":"illegal_state_exception","reason":"alias [datahub_usage_event] has more than one write index [datahub_usage_event-000001,datahub_usage_event-000671]"},"status":500}```

attachment

I fixed this issue deleting write index. The datahub_usage_event-00617 was write index, so i deleted it. Then datahub-elasticsearch-setup-job completed without error message.