Skip to content

feat(thanos): add timebased partitioning shards to compact and storegateway#1211

Open
MeNsaaH wants to merge 3 commits intostevehipwell:mainfrom
MeNsaaH:sharded-thanos
Open

feat(thanos): add timebased partitioning shards to compact and storegateway#1211
MeNsaaH wants to merge 3 commits intostevehipwell:mainfrom
MeNsaaH:sharded-thanos

Conversation

@MeNsaaH
Copy link
Copy Markdown
Contributor

@MeNsaaH MeNsaaH commented Jul 12, 2025

This PR adds support for [time-based partitioning](https://thanos.io/tip/operating/compactor-backlog.md/#scale-the-compactor:~:text=To%20scale%20the%20compactor%20horizontally...) in both the Thanos Compactor and Store Gateway components via configurable Helm values.

Motivation

  • Supports large-scale environments where compaction or query latency becomes a bottleneck.
  • Allows teams to distribute workload across multiple instances in a time-partitioned manner.
  • Helps manage historical data differently from recent or hot data.

This PR supersedes and reimplements the stalled [PR #835](#835) as it hasn't received recent updates or responses.

@MeNsaaH
Copy link
Copy Markdown
Contributor Author

MeNsaaH commented Jul 12, 2025

@stevehipwell to answer your question to #835 (comment), trying to do that makes the chart complex and difficult to read. Creating a different sharded manifest for the shards makes it easier to maintain

@stevehipwell
Copy link
Copy Markdown
Owner

Thanks for the PR @MeNsaaH, the same general comments apply here as on #1209.

@stevehipwell to answer your question to #835 (comment), trying to do that makes the chart complex and difficult to read. Creating a different sharded manifest for the shards makes it easier to maintain

I have to disagree with this position, from a chart maintenance perspective this adds significant additional overhead. My preference would be to add a sharding block at the top of the templates that need to be shard-aware and then use the values defined there to conditionally configure the template for the normal or sharded version. This way I only have a single template to manage and even if it's more complex it's significantly easier to maintain (maintenance issues are generally caused by incomplete work not the complexity of the chart).

@MeNsaaH MeNsaaH force-pushed the sharded-thanos branch 4 times, most recently from 976c560 to aa3072b Compare September 14, 2025 17:07
@MeNsaaH
Copy link
Copy Markdown
Contributor Author

MeNsaaH commented Sep 14, 2025

@stevehipwell this change is now ready. When you get some time kindly review, thanks!

Copy link
Copy Markdown
Owner

@stevehipwell stevehipwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @MeNsaaH, I've added some comments. Could you also update the changelog to cover the changes being made in the PR.

Generate compact shards list as JSON
*/}}
{{- define "thanos.compact.shards" -}}
{{- if .Values.compact.enabled -}}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to check this in the function as it should only be used inside a conditional check.

{{- if $.Values.compact.enabled -}}
{{- $shards := (include "thanos.compact.shards" .) | fromJsonArray }}
{{- range $shard := $shards }}
---
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
---
{{- range $i, $shard := $shards }}
{{- if $i gt 0 }}
---
{{- end }}

I'd rather only have separators where required.

metadata:
name: {{ printf "%s-headless" (include "thanos.compact.fullname" .) }}
namespace: {{ .Release.Namespace }}
name: {{ $shard.fullname }}-headless
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please continue to use printf where I've used it.

Comment on lines +28 to +31
{{- include "thanos.compact.selectorLabels" $ | nindent 4 }}
{{- if $.Values.compact.sharded.enabled }}
shard: {{ $shard.name | quote }}
{{- end }}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should pre-calculate the shard selector labels.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean as pasrt of "thanos.compact.selectorLabels"? Or as a different include function?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They'd need to be computed so something like $shard.SelectorLabels.

Comment on lines +38 to +40
{{- if $.Values.compact.sharded.enabled }}
shard: {{ $shard.name | quote }}
{{- end }}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this is doing here? It looks like a duplication of the change above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah. you're right. Nice catch

{{- if $.Values.compact.sharded.enabled }}
shard: {{ $shard.name | quote }}
{{- end }}
{{- if or $.Values.objstoreConfig.create (gt (len $.Values.compact.podAnnotations) 0) }}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{- if or $.Values.objstoreConfig.create (gt (len $.Values.compact.podAnnotations) 0) }}
{{- if or $.Values.objstoreConfig.create $.Values.compact.podAnnotations }}

Non-empty dicts evaluate to true.

Comment thread charts/thanos/templates/store-gateway/statefulset.yaml
Comment thread charts/thanos/values.yaml
Comment on lines +98 to +108
sharded:
# -- If `true`, create a sharded _Store Gateway_ component.
enabled: false
# -- Shards using time partitioning for the _Store Gateway_ component.
timePartitioning:
# -- Optional. shards will have the name as the suffix of the statefulset name.
- name:
# -- Optional. thanos store min-time
minTime:
# -- Optional. thanos store max-time
maxTime:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sharded:
# -- If `true`, create a sharded _Store Gateway_ component.
enabled: false
# -- Shards using time partitioning for the _Store Gateway_ component.
timePartitioning:
# -- Optional. shards will have the name as the suffix of the statefulset name.
- name:
# -- Optional. thanos store min-time
minTime:
# -- Optional. thanos store max-time
maxTime:
shards:
# -- (string) Shard name, if not set this will be the stateful set name.
- name:
# -- (string) Start of shard time range limit.
minTime:
# -- (string) End of shard time range limit.
maxTime:

I don't think we need an enabled flag here when enabled can be evaluated as .Values.compact.shards.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairs, that's one way to go about it. Looks cleaner too, I like it

@MeNsaaH
Copy link
Copy Markdown
Contributor Author

MeNsaaH commented Sep 20, 2025

@stevehipwell I'm rethinking the strategy for this. Why:

  1. having shards means we'll need to allow modifying all the specs for each shards e.g resources, affinity etc.
  2. This PR adds the option to disable query and other components, so I wonder if it'll be better to just use reuse the chart as a second dependency e.g
# Chart.yaml
dependencies:
- name: thanos
  repository: https://stevehipwell.github.io/helm-charts/
  version: 1.21.1
- name: thanos
  repository: https://stevehipwell.github.io/helm-charts/
  version: 1.21.1
  alias: thanos-store-gateway-1
# values.yaml
...
# other configs
...

query:
...
  additionalStores:
      - "dnssrv+_grpc._tcp.thanos-store-gateway-1.thanos.svc.cluster.local"
...  
  
storeGateway:
...
# other default store Gateway config
  extraArgs:
      - --min-time=-1w # Set default storegateway time range
 

thanos-store-gateway-1:
  fullnameOverride: thanos-store-gateway-1

  objstoreConfig:
    name: thanos-objstore-config
    create: true
    key: thanos.yaml

  compact:
    enabled: false
  query:
    enabled: false
  queryFrontend:
    enabled: false
  rule:
    enabled: false
  redis:
    enabled: false

  storeGateway:
    extraArgs:
      - --max-time=-1w # configure time shards
      - --min-time=-4w # configure time shards

   # extra configurations for the shards
    persistence:
      enabled: true
      accessMode: ReadWriteOnce
      storageClass: gp3
      size: 50Gi

    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        memory: 2Gi
        cpu: 1

What do you think? I could add this to the README so that folks can know how to enable sharding using this setup

@stevehipwell
Copy link
Copy Markdown
Owner

@MeNsaaH what are the values that you think need overriding on a per-shard basis? I'm pretty happy with the API design but I think we can add in an overrides value to manage a limited set of per-shard values.

shards:
  - name: test
    minTime: ""
    maxTime: ""
    overrides:
      resources: {}

…ateway

Signed-off-by: Mmadu Manasseh <mmadumanasseh@gmail.com>
Signed-off-by: Mmadu Manasseh <mmadumanasseh@gmail.com>
Signed-off-by: Mmadu Manasseh <mmadumanasseh@gmail.com>
@stevehipwell
Copy link
Copy Markdown
Owner

@MeNsaaH I know you're busy at the moment, but is this something you're planning on finishing when you have more time?

@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale This is stale label Dec 26, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 5, 2026

This PR was closed because it has been stalled for 10 days with no activity.

@github-actions github-actions bot closed this Jan 5, 2026
@stevehipwell stevehipwell reopened this Feb 9, 2026
@stevehipwell
Copy link
Copy Markdown
Owner

@MeNsaaH are you still planning on working on this?

@Poil
Copy link
Copy Markdown
Contributor

Poil commented Mar 25, 2026

Hi,

I recently migrated from the Bitnami chart to this one. I was using sharded StoreGateways previously, so I'm very interested in this PR. Is there anything I can do to help move this forward?

Specifically, is this PR in a state where it works out of the box for testing, or does it still require significant work? I'd be happy to provide feedback on the migration path and can contribute.

Regards

@stevehipwell
Copy link
Copy Markdown
Owner

@Poil this PR hasn't had activity in 5 months and hasn't had the changes requested in the last review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants