Skip to content

Commit c77d7ca

Browse files
authored
proposal: Added proposal for new Thanos component: Thanos Frontend. (thanos-io#2434)
* proposal: Added proposal for new Thanos component: Thanos Frontend. Signed-off-by: Bartlomiej Plotka <[email protected]> * Added more rationales for separate binary. Signed-off-by: Bartlomiej Plotka <[email protected]> * Addressed Marco comments. Signed-off-by: Bartlomiej Plotka <[email protected]> * Addressed lucas comments. Signed-off-by: Bartlomiej Plotka <[email protected]> * Changed to approved. Signed-off-by: Bartlomiej Plotka <[email protected]> * Moved to query-frontend command. Signed-off-by: Bartlomiej Plotka <[email protected]>
1 parent 8eba769 commit c77d7ca

File tree

2 files changed

+138
-0
lines changed

2 files changed

+138
-0
lines changed
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
---
2+
title: Adding a New Thanos Component that Embeds Cortex Query Frontend
3+
type: proposal
4+
menu: proposals
5+
status: approved
6+
owner: bwplotka
7+
---
8+
9+
### Related Tickets
10+
11+
* Response caching: https://github.com/thanos-io/thanos/issues/1651
12+
* Moving query frontend to separate repo: https://github.com/cortexproject/Cortex/issues/1672
13+
* Discussion about naming: https://cloud-native.slack.com/archives/CK5RSSC10/p1586939369171300
14+
15+
## Summary
16+
17+
This proposal describes addition of a new Thanos command (component) into `cmd/thanos` called `query-frontend`
18+
This component will literally import a certain version of Cortex [frontend package](https://github.com/cortexproject/Cortex/tree/4410bed704e7d8f63418b02b328ddb93d99fad0b/pkg/querier/frontend).
19+
20+
We will go through rationales, and potential alternatives.
21+
22+
## Motivation
23+
24+
[Cortex Frontend](https://www.youtube.com/watch?v=eyBbImSDOrI&t=2s) was introduced by Tom in August 2019. It was designed
25+
to be deployed in front of Prometheus Query API in order to ensure:
26+
27+
* Query split by time.
28+
* Query step alignment.
29+
* Query retry logic
30+
* Query limit logic
31+
* Query response cache in memory, Memcached or Redis.
32+
33+
Since the nature of Cortex backend is really similar to Thanos, with exactly the same PromQL API, and long term capabilities, the caching
34+
work done for Cortex fits to Thanos. Given also our good collaboration in the past, it feels natural to reuse Cortex's code.
35+
We even started discussion to move it to separate repo, but there was no motivation towards this, since there is no issue on using
36+
the Cortex one, as Cortex is happy to take generalized contributions.
37+
38+
At the end we were advertising to use Cortex query frontend on production on top of Thanos and this works considerably well, with some
39+
problems on edge cases and for downsampled data as mentioned [here](https://github.com/thanos-io/thanos/issues/1651).
40+
41+
However, we realized recently that asking users to install suddenly Cortex component on top of Thanos system is extremely confusing:
42+
43+
* Cortex has totally different way of configuring services. It requires deciding what module you have in single YAML file. Thanos in opposite
44+
have flags and different subcommand for each component.
45+
* Cortex has bit different way of configuring memcached, which is inconsistent with what we have in Thanos Store Gateway.
46+
* There are many Cortex specific configuration items which can confuse Thanos user and increase complexity overall.
47+
* We have many ideas how to improve Cortex Query Frontend on top of Thanos, but adding Thanos specific configuration options will increase
48+
complexity on Cortex side as well.
49+
* Cortex has no good example or tutorial on how to use frontend either. We have only [Observatorium example](https://github.com/observatorium/configuration/blob/5129a8beb9507f29aec05566ca9a0f2ad82bbf76/environments/openshift/manifests/observatorium-template.yaml#L515).
50+
51+
All of this were causing confusion and questions like [this](https://cloud-native.slack.com/archives/CK5RSSC10/p1586504362400300?thread_ts=1586492170.387900&cid=CK5RSSC10).
52+
53+
At the end we decided with Thanos and Cortex maintainers that, ultimately, it would be awesome to create a new Thanos service called `query-frontend`.
54+
55+
## Use Cases
56+
57+
* User can cache responses for query range.
58+
* User can split query range queries.
59+
* User can rate limit and retry range queries.
60+
61+
## Goals of this design
62+
63+
* Enable response caching that will easy to use for Thanos users.
64+
* Keep it extensible and scalable for future improvements like advanced query planning, queuing, rate limiting etc.
65+
* Reuse as much as possible between projects, contribute.
66+
* Use the same configuration patterns as rest of Thanos components.
67+
68+
## Non Goals
69+
70+
* Create Thanos specific response caching from scratch.
71+
72+
## Proposal
73+
74+
The idea is to create `thanos query-frontend` component that allows specifying following options:
75+
76+
* `--query-range.split-interval`, `time.Duration`
77+
* `--query-range.max-retries-per-request`, `int`, default = `5`
78+
* `--query-range.disable-step-align`, `bool`
79+
* `--query-range.response-cache-ttl` `time.Duration`
80+
* `--query-range.response-cache-max-freshness` `time.Duration` default = `1m`
81+
* `--query-range.response-cache-config(-file)` `pathorcontent` + [CacheConfig](https://github.com/thanos-io/thanos/blob/55cb8ca38b3539381dc6a781e637df15c694e50a/pkg/store/cache/factory.go#L32)
82+
83+
We plan to have in-mem, fifo and memcached support for now. Cache config will be exactly the same as the one used for Store Gateway.
84+
85+
This command will be placeholder for any query planning or queueing logic that we might want to add at some point. It will be not part of any gRPC API.
86+
87+
To make this happen we will propose a small refactor in Cortex code to avoid unnecessary package dependencies.
88+
89+
### Alternatives
90+
91+
#### Don't add anything, document Cortex query frontend and add examples of usage
92+
93+
Unfortunately we tried this path already without success. Reasons were mentioned in [Motivation](202004_embedd_cortex_frontend.md#Motivation)
94+
95+
#### Add response caching to Querier itself, in the same process.
96+
97+
This will definitely simplify deployment if Querier would allow caching directly. However, this way is not really scalable.
98+
99+
Furthermore, eventually frontend will be responsible for more than just caching. It is meant to do query planning like splitting or even
100+
advanced query parallelization (query sharding). This might mean future improvements in terms of query scheduling, queuing and retrying.
101+
This means that at some point we would need an ability to scale query part and caching/query planner totally separately.
102+
103+
Last but not least splitting queries allows to perform request in parallel. Only if used in single binary we can achieve load balancing of those requests.
104+
105+
NOTE: We can still consider just simple response caching inside the Querier if user will request so.
106+
107+
#### Write response caching from scratch.
108+
109+
I think this does not need to be explained. Response caching has proven to be not trivial. It's really amazing that we
110+
have opportunity to work towards something that works with experts in the field like @tomwilkie and others from Loki and Cortex Team.
111+
112+
Overall, [Reusing is caring](https://www.bwplotka.dev/2020/how-to-became-oss-maintainer/#5-want-more-help-give-back-help-others).
113+
114+
## Work Plan
115+
116+
1. Refactor [IndexCacheConfig](https://github.com/thanos-io/thanos/blob/55cb8ca38b3539381dc6a781e637df15c694e50a/pkg/store/cache/factory.go#L32) to generic cache config so we can reuse.
117+
Make it implement Cortex cache.Cache interface.
118+
1. Add necessary changes to Cortex frontend
119+
* Metric generalization (they are globals now).
120+
* Avoid unnecessary dependencies.
121+
1. Add `thanos query-frontend` subcommand.
122+
1. Add proper e2e test using cache.
123+
1. Document new subcommand
124+
1. Add to [kube-thanos](https://github.com/thanos-io/kube-thanos)
125+
126+
## Future Work
127+
128+
Improvements to Cortex query frontend, so Thanos `query-frontend` as described [here](https://github.com/thanos-io/thanos/issues/1651)

docs/proposals/_index.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
11
---
22
title: "Proposals:"
33
---
4+
5+
List of current proposals.
6+
7+
Proposals can have 5 Statuses (`.Params.Status`):
8+
9+
* accepted
10+
* complete
11+
* rejected
12+
* in-review
13+
* draft

0 commit comments

Comments
 (0)