Hello DataHub community,
How do we set the number of primary shards for the elastic search index datahub_usage_event-XXXXXX
to 1 from the default value of 5 using datahub-helm ?
by defualt number of shards 1 per index and 1 replica.
if want changes this for already deployed datahub
resharding must be enabled by settung
ELASTICSEARCH_INDEX_BUILDER_SETTINGS_REINDEX=true
and the specific shard count overridden for the index
ELASTICSEARCH_INDEX_BUILDER_ENTITY_SETTINGS_OVERRIDES='{"datasetindex_v2":{"number_of_shards":"10"}}').
These configs in value.yml
https://github.com/acryldata/datahub-helm/blob/ecb168d9255435476311c6d6808bf214f587e72c/charts/datahub/values.yaml#L412
enable datahubSystemUpdate
job while making this config chnages. any changes in the chart value for ES shards the datahubSystemUpdate
will trigger the reindex based on the diff between the expected and actual shard settings
Thanks <@U0445MUD81W>. I have tried '{"datahub_usage_event":{"number_of_shards":"1"},"system_metadata_service_v1":{"number_of_shards":"5"}}'
config for datahub_usage_event_XXXXX
. The shards are still 5. This index is created frequently with a number as a postfix.
is it solved your issue …?
you tried with "number_of_shards":"5"
or "number_of_shards":"1"
?
It did not solve my issue. datahub_usage_event_xxxx
index gets created with 5 shards even after adding the config of "number_of_shards":"1"
Maybe what you are seeing are rollovers? The default number of shards for datahub_usage_event
is 1.
It might be helpful to explain here what is the problem that you are trying to solve.
Config is
settingsOverrides: '{"datahub_usage_event": {"number_of_shards":"1", "number_of_replicas":"1"}}'
dices | grep datahub_usage_event-
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7788 100 7788 0 0 124k 0 --:--:-- --:--:-- --:--:-- 126k
green open datahub_usage_event-000004 H-xxxxxx 5 1 0 0 2kb 1kb
green open datahub_usage_event-000003 f0JLy-xx 5 1 0 0 2kb 1kb
green open datahub_usage_event-000002 4cPlC1-x 5 1 0 0 2kb 1kb
green open datahub_usage_event-000001 xxxxxxxx 5 1 0 0 2kb 1kb```
I think the default for some reason for datahub_usage_event
is 5
The default is 1
Can you /_cat/indices?h=index,pri
dices?h=index,pri
chart_chartusagestatisticsaspect_v1 1
datajobindex_v2 1
dataflowindex_v2 1
mlmodelgroupindex_v2 1
assertionindex_v2 1
roleindex_v2 1
dataprocessindex_v2 1
.opendistro-reports-definitions 1
globalsettingsindex_v2 1
.opendistro_security 1
.opendistro-reports-instances 1
chartindex_v2 1
tagindex_v2 1
.opensearch-observability 1
dataplatforminstanceindex_v2 1
telemetryindex_v2 1
datajob_datahubingestionrunsummaryaspect_v1 1
dataplatformindex_v2 1
dataproductindex_v2 1
dataprocessinstanceindex_v2 1
invitetokenindex_v2 1
graph_service_v1 1
system_metadata_service_v1 1
dataset_operationaspect_v1 1
containerindex_v2 1
.tasks 1
schemafieldindex_v2 1
domainindex_v2 1
notebookindex_v2 1
datahubupgradeindex_v2 1
datahubroleindex_v2 1
glossarytermindex_v2 1
postindex_v2 1
dataset_datasetusagestatisticsaspect_v1 1
datahubexecutionrequestindex_v2 1
dataset_datasetprofileaspect_v1 1
datahubsecretindex_v2 1
mlmodelindex_v2 1
datahubpolicyindex_v2 1
corpuserindex_v2 1
datahubstepstateindex_v2 1
datahubviewindex_v2 1
queryindex_v2 1
mlmodeldeploymentindex_v2 1
datajob_datahubingestioncheckpointaspect_v1 1
dashboardindex_v2 1
assertion_assertionruneventaspect_v1 1
datasetindex_v2 1
mlfeatureindex_v2 1
dashboard_dashboardusagestatisticsaspect_v1 1
datahub_usage_event-000004 5
datahub_usage_event-000003 5
datahub_usage_event-000005 5
glossarynodeindex_v2 1
datahubingestionsourceindex_v2 1
datahubretentionindex_v2 1
ownershiptypeindex_v2 1
dataprocessinstance_dataprocessinstanceruneventaspect_v1 1
.kibana_1 1
datahub_usage_event-000002 5
datahubaccesstokenindex_v2 1
datahub_usage_event-000001 5
testindex_v2 1
mlfeaturetableindex_v2 1
.opendistro-job-scheduler-lock 5
mlprimarykeyindex_v2 1
corpgroupindex_v2 1```
So looks like because the number of shards wasn’t specified when this index was created, it defaults to 5. I’m guessing you were on Elasticsearch 6.x or lower.
> Previous versions of Elasticsearch defaulted to creating five shards per index. Starting with 7.0.0, the default is now one shard per index.
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/breaking-changes-7.0.html#_index_creation_no_longer_defaults_to_five_shards
On the Datahub side, doesn’t look like there’s an ES Index Rebuilder for datahub_usage_event
.
Your options here are either live with 5 shards, or manually re-create the index with 1 shard. There’s currently no automation in Datahub to recreate/rebuild it.
I will manually recreate it. Thanks Davi