Skip to content

fix: bound in-memory rate limit storage to prevent memory exhaustion (CWE-400)#4607

Open
sebastiondev wants to merge 1 commit intoIBM:mainfrom
sebastiondev:security/cwe400-bounded-rate-limit-storage
Open

fix: bound in-memory rate limit storage to prevent memory exhaustion (CWE-400)#4607
sebastiondev wants to merge 1 commit intoIBM:mainfrom
sebastiondev:security/cwe400-bounded-rate-limit-storage

Conversation

@sebastiondev
Copy link
Copy Markdown

Summary

The in-memory rate limit storage in mcpgateway/admin.py is an unbounded defaultdict(list) keyed by client IP. Each new client IP that hits a @rate_limit endpoint adds a new key, and keys are never removed — even after the timestamps for that IP have all expired. Over time (or under sustained load from many distinct source IPs) the dict grows without bound, eventually exhausting process memory.

This PR bounds the storage to 10,000 entries using an OrderedDict with LRU-style eviction, and prunes keys whose timestamps have fully expired.

  • CWE: CWE-400 (Uncontrolled Resource Consumption)
  • Severity: Low — see "Honest scoping" below
  • File: mcpgateway/admin.pyrate_limit_storage and the rate_limit() decorator
  • Data flow: request.client.host (TCP peer address) → used as a dict key in rate_limit_storage → key is created on first request and never deleted.

Fix

# Before
rate_limit_storage = defaultdict(list)

# After
_RATE_LIMIT_MAX_KEYS: int = 10_000
rate_limit_storage: OrderedDict[str, list] = OrderedDict()

Inside the decorator:

  1. Prune expired timestamps for the current IP into a local list, then write back.
  2. move_to_end(client_ip) so active IPs are treated as most-recently-used.
  3. Evict the oldest entries with popitem(last=False) while the dict exceeds _RATE_LIMIT_MAX_KEYS.
  4. Remove keys whose timestamp lists are empty or whose last timestamp is older than the 60-second window.

The cap of 10,000 was chosen to comfortably accommodate any plausible legitimate client population for an admin API while keeping worst-case memory bounded (~hundreds of KB).

Tests

  • The existing test in tests/unit/mcpgateway/test_admin.py was updated to assert rate_limit_storage is an OrderedDict rather than a defaultdict, and to use the new access pattern.
  • The decorator's docstring example was updated to match the new type.
  • All existing rate_limit behavior (limit enforcement, 429 response, per-IP isolation, time-window pruning) is preserved.

Security analysis

We verified the issue is real but its exploitability is limited:

  • Authenticated only. Every endpoint that uses @rate_limit is also gated by @require_permission, so an attacker needs valid credentials before any of their traffic reaches this code path.
  • TCP peer, not header. The key is request.client.host (the TCP source address), not a header-derived value. X-Forwarded-For spoofing does not create new keys — distinct TCP source IPs are required, which means a real botnet or proxy pool.
  • Admin API is dev-only by policy. SECURITY.md documents that the Admin UI/API should not be exposed in production.

Given those preconditions, growth is slow and gated. But "unbounded dict on a long-running process" is still a real defense-in-depth bug worth fixing — especially since the fix is small, local, and changes no externally observable behavior.

Honest scoping

Before submitting we tried to disprove this. We checked whether anything upstream (auth, framework limits, or another cleanup path) would already bound the dict — nothing does; entries are added on first use and there is no eviction or TTL anywhere. We also want to be straightforward that the worst-case attacker here is an authenticated user with access to many distinct source IPs against an admin API that your own SECURITY.md says shouldn't be in production. That puts this firmly in low-severity / hardening territory rather than a critical DoS, and we'd rather under-claim than over-claim.

cc @lewiswigmore

…(CWE-400)

Replace unbounded defaultdict(list) with a capped OrderedDict for
rate_limit_storage. The storage is now bounded to 10,000 keys maximum.

Changes:
- Use OrderedDict instead of defaultdict for rate_limit_storage
- Evict least-recently-used IPs when storage exceeds _RATE_LIMIT_MAX_KEYS
- Remove keys whose timestamps have all expired (stale key cleanup)
- Update existing test to validate stale entry cleanup behavior
- Update doctest to reflect OrderedDict type

Signed-off-by: Sebastion <sebastion@sebastion.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant