fix: rate limiting token bucket timing and dict memory leak#4057
fix: rate limiting token bucket timing and dict memory leak#4057strawgate wants to merge 2 commits into
Conversation
|
surely we can find a non-hand rolled ttl dict |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5c8f0f2fa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if self.tokens >= tokens: | ||
| self.tokens -= tokens | ||
| self.last_refill = now | ||
| return True |
There was a problem hiding this comment.
Advance token-bucket clock on denied consumes
Keeping last_refill inside the success branch causes elapsed time to be counted repeatedly across denied requests: after one denial, the next retry still computes elapsed from the old timestamp and adds the same interval again. In practice, a client that retries quickly after depleting tokens can refill much faster than refill_rate (bypassing the configured rate limit). The refill timestamp needs to be updated whenever refill math is applied, not only when tokens are consumed.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
You're right - my fix was incorrect. Moving last_refill inside the success branch causes time to freeze on denials, which allows a client to accumulate tokens faster than the configured rate by repeatedly retrying after denial.
I've reverted this change. The token bucket algorithm now correctly updates last_refill on every call (original behavior), which is the standard token bucket semantics.
The only fix in this PR is now the dict TTL eviction for per-client rate limiters (issue #4053).
|
CI failed due to a pre-existing flaky test ( Root Cause: The test creates a This PR only modifies Log excerptRelated files
|
e5c8f0f to
a6dde66
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a6dde667c1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| stale_ids = [ | ||
| cid | ||
| for cid, last_access in self._limiter_access_times.items() | ||
| if now - last_access > self._limiter_ttl |
There was a problem hiding this comment.
Preserve token debt when evicting stale client limiters
Evicting buckets purely on a fixed 300s inactivity window can under-enforce low-rate limits: if capacity / refill_rate is greater than 5 minutes, a client can be removed before its bucket would naturally refill, then recreated with full tokens on the next access (_get_limiter), effectively erasing accrued debt. In a multi-client server, traffic from other clients will trigger this cleanup path, so a client configured for very slow refill (e.g. 0.001 rps) can regain burst capacity far earlier than configured.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid concern. The 5-min TTL can indeed leak tokens for very low rate limits (e.g., 1 RPS with high capacity). Increased the TTL to 1 hour (3600s), which provides a better trade-off: the memory leak is solved for realistic use cases (clients returning within an hour), and token debt leakage only affects sustained low-rate clients that are inactive for >1 hour.
For servers needing stricter guarantees with very low rate limits, a follow-up could replace TTL-based eviction with LRU eviction or track last_refill_time for more precise eviction. This fix addresses the primary issue (unbounded memory growth) with a pragmatic balance.
There was a problem hiding this comment.
Good catch. Rather than trying to tune the TTL (which creates second-order effects as you describe), I've replaced the TTL-based approach entirely with an LRU cache using stdlib OrderedDict. This:
- Bounds memory at
max_clients(default 10,000) without any TTL thread - Uses
move_to_endfor O(1) LRU access - Evicts the least recently used client when capacity is reached
- Simpler code than the TTL approach — no timestamps, no cleanup scheduling
This is a better fit for the problem: per-client state is bounded by a fixed capacity, and LRU naturally handles the "most active clients keep their state" semantics.
a6dde66 to
30869a2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 30869a2e25
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if len(self._client_limiters) > self._max_clients: | ||
| self._client_limiters.popitem(last=False) |
There was a problem hiding this comment.
Preserve token-bucket state when evicting LRU clients
Evicting the oldest limiter here discards that client's remaining-token debt, so once active client cardinality exceeds max_clients, a previously throttled client is recreated with a full bucket on its next request. This allows configured per-client limits to be bypassed under high-cardinality traffic (or by cycling client IDs), because rate-limit state is reset by eviction rather than refill.
Useful? React with 👍 / 👎.
| if len(self._client_limiters) > self._max_clients: | ||
| self._client_limiters.popitem(last=False) |
There was a problem hiding this comment.
Retain sliding-window history across cache pressure
This LRU eviction drops a client's timestamp deque entirely; if the middleware sees more than max_clients distinct clients, an evicted client returns with an empty window and can immediately send another full max_requests burst. That under-enforces the advertised window limit whenever client cardinality crosses the cache size.
Useful? React with 👍 / 👎.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
30869a2 to
d280a50
Compare
Summary
Fixes a memory leak in
RateLimitingMiddlewareandSlidingWindowRateLimitingMiddlewarewhere per-client rate limiter entries were stored in an unboundeddefaultdict, causing unlimited memory growth as new client IDs were seen.Solution: Replaced the
defaultdictwith anOrderedDict-based LRU cache with a configurablemax_clientslimit (default 10,000). When the limit is reached, the least recently used client is evicted to make room for a new client.This approach:
collections.OrderedDict)SlidingWindowRateLimitingMiddlewarewhich had the samedefaultdictissueFixes #4053.