fix: bound in-memory rate limit storage to prevent memory exhaustion (CWE-400)#4607
Open
sebastiondev wants to merge 1 commit intoIBM:mainfrom
Open
fix: bound in-memory rate limit storage to prevent memory exhaustion (CWE-400)#4607sebastiondev wants to merge 1 commit intoIBM:mainfrom
sebastiondev wants to merge 1 commit intoIBM:mainfrom
Conversation
…(CWE-400) Replace unbounded defaultdict(list) with a capped OrderedDict for rate_limit_storage. The storage is now bounded to 10,000 keys maximum. Changes: - Use OrderedDict instead of defaultdict for rate_limit_storage - Evict least-recently-used IPs when storage exceeds _RATE_LIMIT_MAX_KEYS - Remove keys whose timestamps have all expired (stale key cleanup) - Update existing test to validate stale entry cleanup behavior - Update doctest to reflect OrderedDict type Signed-off-by: Sebastion <sebastion@sebastion.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The in-memory rate limit storage in
mcpgateway/admin.pyis an unboundeddefaultdict(list)keyed by client IP. Each new client IP that hits a@rate_limitendpoint adds a new key, and keys are never removed — even after the timestamps for that IP have all expired. Over time (or under sustained load from many distinct source IPs) the dict grows without bound, eventually exhausting process memory.This PR bounds the storage to 10,000 entries using an
OrderedDictwith LRU-style eviction, and prunes keys whose timestamps have fully expired.mcpgateway/admin.py—rate_limit_storageand therate_limit()decoratorrequest.client.host(TCP peer address) → used as a dict key inrate_limit_storage→ key is created on first request and never deleted.Fix
Inside the decorator:
move_to_end(client_ip)so active IPs are treated as most-recently-used.popitem(last=False)while the dict exceeds_RATE_LIMIT_MAX_KEYS.The cap of 10,000 was chosen to comfortably accommodate any plausible legitimate client population for an admin API while keeping worst-case memory bounded (~hundreds of KB).
Tests
tests/unit/mcpgateway/test_admin.pywas updated to assertrate_limit_storageis anOrderedDictrather than adefaultdict, and to use the new access pattern.rate_limitbehavior (limit enforcement, 429 response, per-IP isolation, time-window pruning) is preserved.Security analysis
We verified the issue is real but its exploitability is limited:
@rate_limitis also gated by@require_permission, so an attacker needs valid credentials before any of their traffic reaches this code path.request.client.host(the TCP source address), not a header-derived value.X-Forwarded-Forspoofing does not create new keys — distinct TCP source IPs are required, which means a real botnet or proxy pool.SECURITY.mddocuments that the Admin UI/API should not be exposed in production.Given those preconditions, growth is slow and gated. But "unbounded dict on a long-running process" is still a real defense-in-depth bug worth fixing — especially since the fix is small, local, and changes no externally observable behavior.
Honest scoping
Before submitting we tried to disprove this. We checked whether anything upstream (auth, framework limits, or another cleanup path) would already bound the dict — nothing does; entries are added on first use and there is no eviction or TTL anywhere. We also want to be straightforward that the worst-case attacker here is an authenticated user with access to many distinct source IPs against an admin API that your own SECURITY.md says shouldn't be in production. That puts this firmly in low-severity / hardening territory rather than a critical DoS, and we'd rather under-claim than over-claim.
cc @lewiswigmore