You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -246,15 +247,17 @@ For example if watch stream is clogged, or when using etcd version v3.3 and olde
246
247
247
248
If the watch cache doesn't catch up in within some time limit we either fail the request or have a fallback.
248
249
249
-
If the fallback is to forward consistent reads to etcd, a cascading failure
250
-
is likely to occur if caches become stale and a large number of read requests
251
-
are forwarded to etcd.
250
+
For Beta we have implemented a fallback mechanism that will revert to normal behavior before we reach timeout.
251
+
To monitor fallback rate we introduced `apiserver_watch_cache_consistent_read_total` with `fallback` and `success` labels.
252
252
253
-
Since falling back to etcd won't work, we should fail the requests and rely on
254
-
rate limiting to prevent cascading failure. I.e. `Retry-After` HTTP header (for
255
-
well-behaved clients) and [Priority and Fairness](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md).
253
+
With qualification results showing that fallback is needed and we can go back to the original design.
254
+
We should fail the requests and rely on rate limiting to prevent cascading failure. I.e. `Retry-After` HTTP header (for
255
+
well-behaved clients) and [Apiserver Priority and Fairness](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md).
256
+
The main reason for that is the added complexity and incorrect handling in APF that assumes that request cost doesn't change.
256
257
257
-
In order to mitigate such problems, let's present how the system currently works
258
+
### How to debug cache issue?
259
+
260
+
Let's present how the system currently works
258
261
in different cases. In addition to that, we add column indicating whether a given
259
262
case will change how watchcache implementation will be handling the request.
260
263
@@ -329,45 +332,41 @@ the only reasonable procedure is:
329
332
330
333
The existing API already allows us to achieve it.
331
334
335
+
### Qualification
336
+
337
+
As the feature touches a hard distributed caching problem and depends on previously broken etcd behavior, we need to qualify it in production environment.
338
+
339
+
After almost a year in Beta, running enabled default, we have collected the following data:
340
+
* More than 80% of LISTs served by apiserver are consistent reads from cache.
341
+
* In 99.9% of cases watch cache became fresh enough within 110ms.
342
+
* Only 0.001% of waits for fresh cache took more than 250ms.
343
+
* Consistent reads reached 5 nine's of availability, meaning cache was able to become fresh before timeout (3s) in 99.999% of cases.
344
+
* Main cause of fallback was rolling update of etcd that forces a watch cache reinitialization.
345
+
* We have identified and addressed one issue https://github.com/kubernetes/kubernetes/issues/129931
346
+
347
+
The above results show that consistent reads from cache are stable and reliable. We are confident in promoting this feature to Stable.
348
+
349
+
## Design Details
350
+
351
+
### Monitoring
352
+
332
353
To further allow debugging and improve confidence we will provide users with the
333
354
following tools:
334
355
* a dedicated `apiserver_watch_cache_read_wait` metric to detect a problem with
335
356
watch cache.
336
357
* a `inconsistency detector` that for requests served from watchcache will be able
337
358
to send a request to etcd (as described above) and compare the results
338
359
339
-
Metric `apiserver_watch_cache_read_wait` will measure wait time experienced by
360
+
Metric `apiserver_watch_cache_read_wait` will measure wait time experienced by
340
361
reads for watch cache to become fresh. If user notices a latency request in
341
362
they can use this metric to confirm that the issue is caused by watch cache.
342
363
343
364
The `inconsistency detector` will get enabled in our CI to detect issues with
344
365
the introduced mechanism.
345
366
346
-
## Design Details
347
-
348
367
### Pagination
349
368
350
-
Given that the watch cache does not paginate responses, how can clients requesting
351
-
pagination for resourceVersion="" reads be supported?
352
-
353
-
#### Option: Serve 1st page of paginated requests from the watch cache
354
-
355
-
Only serve the 1st page of paginated requests from the watch cache. The watch
356
-
cache would need to construct the appropriate continuation token such that the
357
-
subsequent pages can be served from etcd.
358
-
359
-
An even more conservative approach would be to only serve paginated requests
360
-
that fit within a single page from the watch cache, in which cache the watch
361
-
cache doesn't need to construct continuation tokens at all.
362
-
363
-
In practice, this options might be sufficient to get the bulk of the scalability
364
-
benefits of serving consistent reads from cache. For example, the kubelet LIST
365
-
pods use case would be handled, as would similar cases. Not all cases would
366
-
be handled.
367
-
368
-
#### Future work: Enable pagination in the watch cache
369
-
370
-
Ongoing work to support pagination in watch cache: https://github.com/kubernetes/kubernetes/issues/108003
369
+
Pagination from cache is being implemented as part of [KEP-4988](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/4988-snapshottable-api-server-cache/README.md)
371
370
372
371
### Test Plan
373
372
@@ -443,8 +442,8 @@ Comparing resource usage and latency with and without consistent list from watch
0 commit comments