fix: rate limiting token bucket timing and dict memory leak by strawgate · Pull Request #4057 · PrefectHQ/fastmcp

strawgate · 2026-04-26T05:29:25Z

Summary

Fixes a memory leak in RateLimitingMiddleware and SlidingWindowRateLimitingMiddleware where per-client rate limiter entries were stored in an unbounded defaultdict, causing unlimited memory growth as new client IDs were seen.

Solution: Replaced the defaultdict with an OrderedDict-based LRU cache with a configurable max_clients limit (default 10,000). When the limit is reached, the least recently used client is evicted to make room for a new client.

This approach:

Bounds memory at a configurable maximum
Uses only stdlib (collections.OrderedDict)
No TTL thread/cleanup overhead
Preserves per-client rate limiting semantics
Also applied the same fix to SlidingWindowRateLimitingMiddleware which had the same defaultdict issue

Fixes #4053.

strawgate · 2026-04-26T05:30:10Z

surely we can find a non-hand rolled ttl dict

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e5c8f0f2fa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-26T05:31:16Z

            if self.tokens >= tokens:
                self.tokens -= tokens
+                self.last_refill = now
                return True


Advance token-bucket clock on denied consumes

Keeping last_refill inside the success branch causes elapsed time to be counted repeatedly across denied requests: after one denial, the next retry still computes elapsed from the old timestamp and adds the same interval again. In practice, a client that retries quickly after depleting tokens can refill much faster than refill_rate (bypassing the configured rate limit). The refill timestamp needs to be updated whenever refill math is applied, not only when tokens are consumed.

Useful? React with 👍 / 👎.

You're right - my fix was incorrect. Moving last_refill inside the success branch causes time to freeze on denials, which allows a client to accumulate tokens faster than the configured rate by repeatedly retrying after denial.

I've reverted this change. The token bucket algorithm now correctly updates last_refill on every call (original behavior), which is the standard token bucket semantics.

The only fix in this PR is now the dict TTL eviction for per-client rate limiters (issue #4053).

marvin-context-protocol · 2026-04-26T05:40:07Z

CI failed due to a pre-existing flaky test (test_timeout_tool_call_overrides_client_timeout_even_if_lower) in tests/client/client/test_timeout.py, unrelated to this PR's rate-limiting changes. Safe to re-run.

Root Cause: The test creates a Client(timeout=0.1) then calls client.call_tool("sleep", {"seconds": 0.5}, timeout=2) — expecting the per-call timeout=2 to override the client-level timeout. The tool call itself completes, but the MCP library's _validate_tool_result then internally calls list_tools(), which sends a new request using the client-level 0.1s timeout rather than the per-call override. On Python 3.10 (ubuntu-latest only), this post-call validation step times out. The failure is timing-sensitive and doesn't reproduce on Python 3.13 or Python 3.10 Windows.

This PR only modifies src/fastmcp/server/middleware/rate_limiting.py and has no connection to client timeout logic.

Log excerpt

FAILED tests/client/client/test_timeout.py::TestTimeout::test_timeout_tool_call_overrides_client_timeout_even_if_lower
  - mcp.shared.exceptions.McpError: Timed out while waiting for response to ClientRequest. Waited 0.1 seconds.

Call stack: call_tool (tools.py:305) → call_tool_mcp (tools.py:161) → _await_with_session_monitoring (client.py:762)
  → mcp/client/session.py:395 call_tool → :403 _validate_tool_result → :529 list_tools → send_request [timeout]

Related files

tests/client/client/test_timeout.py:37–44 — the failing test (pre-exists this PR, present in main)
src/fastmcp/client/mixins/tools.py:161 — call_tool_mcp which passes to _await_with_session_monitoring
src/fastmcp/client/client.py:762 — _await_with_session_monitoring

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6dde667c1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-26T06:19:52Z

+        stale_ids = [
+            cid
+            for cid, last_access in self._limiter_access_times.items()
+            if now - last_access > self._limiter_ttl


Preserve token debt when evicting stale client limiters

Evicting buckets purely on a fixed 300s inactivity window can under-enforce low-rate limits: if capacity / refill_rate is greater than 5 minutes, a client can be removed before its bucket would naturally refill, then recreated with full tokens on the next access (_get_limiter), effectively erasing accrued debt. In a multi-client server, traffic from other clients will trigger this cleanup path, so a client configured for very slow refill (e.g. 0.001 rps) can regain burst capacity far earlier than configured.

Useful? React with 👍 / 👎.

Valid concern. The 5-min TTL can indeed leak tokens for very low rate limits (e.g., 1 RPS with high capacity). Increased the TTL to 1 hour (3600s), which provides a better trade-off: the memory leak is solved for realistic use cases (clients returning within an hour), and token debt leakage only affects sustained low-rate clients that are inactive for >1 hour.

For servers needing stricter guarantees with very low rate limits, a follow-up could replace TTL-based eviction with LRU eviction or track last_refill_time for more precise eviction. This fix addresses the primary issue (unbounded memory growth) with a pragmatic balance.

Good catch. Rather than trying to tune the TTL (which creates second-order effects as you describe), I've replaced the TTL-based approach entirely with an LRU cache using stdlib OrderedDict. This:

Bounds memory at max_clients (default 10,000) without any TTL thread

Uses move_to_end for O(1) LRU access

Evicts the least recently used client when capacity is reached

Simpler code than the TTL approach — no timestamps, no cleanup scheduling

This is a better fit for the problem: per-client state is bounded by a fixed capacity, and LRU naturally handles the "most active clients keep their state" semantics.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30869a2e25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-26T15:16:49Z

+        if len(self._client_limiters) > self._max_clients:
+            self._client_limiters.popitem(last=False)


Preserve token-bucket state when evicting LRU clients

Evicting the oldest limiter here discards that client's remaining-token debt, so once active client cardinality exceeds max_clients, a previously throttled client is recreated with a full bucket on its next request. This allows configured per-client limits to be bypassed under high-cardinality traffic (or by cycling client IDs), because rate-limit state is reset by eviction rather than refill.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-26T15:16:49Z

+        if len(self._client_limiters) > self._max_clients:
+            self._client_limiters.popitem(last=False)


Retain sliding-window history across cache pressure

This LRU eviction drops a client's timestamp deque entirely; if the middleware sees more than max_clients distinct clients, an evicted client returns with an empty window and can immediately send another full max_requests burst. That under-enforces the advertised window limit whenever client cardinality crosses the cache size.

Useful? React with 👍 / 👎.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

strawgate added the DON'T MERGE PR is not ready for merging. Used by authors to prevent premature merging. label Apr 26, 2026

marvin-context-protocol Bot added bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. server Related to FastMCP server implementation or server-side functionality. labels Apr 26, 2026

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

strawgate force-pushed the codex/rate-limiting-fix branch from e5c8f0f to a6dde66 Compare April 26, 2026 06:16

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

strawgate force-pushed the codex/rate-limiting-fix branch from a6dde66 to 30869a2 Compare April 26, 2026 15:13

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

MukundaKatta mentioned this pull request Apr 26, 2026

Fix #4056: keep blank query values, add token bucket regression test #4069

Merged

strawgate and others added 2 commits May 12, 2026 22:39

fix: add bounded LRU cache for per-client rate limiters

12d882c

docs: clarify LRU eviction safety in rate limiter docstrings

d280a50

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

strawgate force-pushed the codex/rate-limiting-fix branch from 30869a2 to d280a50 Compare May 13, 2026 03:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: rate limiting token bucket timing and dict memory leak#4057

fix: rate limiting token bucket timing and dict memory leak#4057
strawgate wants to merge 2 commits into
mainfrom
codex/rate-limiting-fix

strawgate commented Apr 26, 2026 •

edited

Loading

Uh oh!

strawgate commented Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

strawgate Apr 26, 2026

Uh oh!

marvin-context-protocol Bot commented Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

strawgate Apr 26, 2026

Uh oh!

strawgate Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if len(self._client_limiters) > self._max_clients:
		self._client_limiters.popitem(last=False)

Conversation

strawgate commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

strawgate commented Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

strawgate Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

marvin-context-protocol Bot commented Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

strawgate Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

strawgate Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

strawgate commented Apr 26, 2026 •

edited

Loading