Skip to content

chore(plugins): align rate-limiter config with gateway convention#4582

Open
gandhipratik203 wants to merge 3 commits intomainfrom
chore/rate-limiter-redis-config-hardening
Open

chore(plugins): align rate-limiter config with gateway convention#4582
gandhipratik203 wants to merge 3 commits intomainfrom
chore/rate-limiter-redis-config-hardening

Conversation

@gandhipratik203
Copy link
Copy Markdown
Collaborator

@gandhipratik203 gandhipratik203 commented May 4, 2026

Summary

Align the rate-limiter plugin's redis_url in plugins/config.yaml with the gateway's existing convention of sourcing Redis URLs from the REDIS_URL env via the plugin loader's Jinja substitution. Operators set REDIS_URL once and the rate-limiter picks it up — no more hand-editing the yaml or templating it in deployment tooling.

Companion to #4581.

Changes

  • plugins/config.yaml: redis_url: "redis://redis:6379/0"redis_url: '{{ env.REDIS_URL | default("redis://redis:6379/0") }}' plus a short comment about the env-source. Outer single / inner double quotes survive yaml.safe_dump round-trips cleanly.
  • tests/integration/test_rate_limiter_redis_url_from_yaml.py: new test — env-var substitution flows through the plugin loader cleanly and the resolved URL reaches a live Redis (PING → PONG). Skips when Redis isn't reachable.

Test plan

  • pytest tests/integration/test_rate_limiter_redis_url_from_yaml.py green locally with Redis running
  • Skips cleanly without Redis
  • CI green

The schema-regression test and redis_fallback cleanup originally bundled here are now tracked as #4603.

@msureshkumar88
Copy link
Copy Markdown
Collaborator

msureshkumar88 commented May 5, 2026

Me review your PR! Changes GOOD idea but have few problems that MUST fix before merge.

🔴 CRITICAL Issues (Must Fix Now!)

1. Bare Exception Swallow All Errors 🚨

  • Location: test_plugins_config_yaml_validation.py lines 116-120
  • Problem: Code catch ALL exceptions, even security failures!
  • Attack: Bad plugin could bypass rate limiting if fail to load
  • Fix: Only catch connection errors, let security failures FAIL test

2. No URL Validation 🚨

  • Location: test_plugins_config_yaml_validation.py lines 54-61
  • Problem: No check if redis_url is safe! Could be SSRF attack vector
  • Attack: REDIS_URL pointing to internal-admin service could steal data
  • Fix: Validate URL scheme is redis, check hostname not internal, mask credentials

3. Breaking Change: redis_fallback Removal Not Justified ⚠️

  • You remove redis_fallback: true but:
    • Still in docs (plugin-bindings-api.md line 345)
    • Still in Helm charts (values-pgo.yaml line 101)
    • Still in tests (test_rate_limiter.py line 1267)
    • Existing test validate this behavior!
  • Impact: If Redis down, what happen? Fail open (no rate limit) or fail closed (deny all)?
  • Fix: Show evidence key is unrecognized OR revert change OR update ALL docs/tests

🟡 Should Fix (Recommended)

  1. Performance: Use async Redis client instead of blocking sync call
  2. Testing Gap: Add negative tests (bad URLs, missing env vars, malformed Jinja)
  3. Logging Gap: Add audit trail when config changes

Read full report for:

  • All issues with code examples
  • Alternative approaches you could use
  • Risk assessment matrix
  • Specific code fixes for each issue

✅ What Need Before Merge

MUST FIX (Blocking):

  1. Fix bare exception handling
  2. Add URL validation
  3. Justify redis_fallback removal OR revert (breaking change)
  4. Update ALL documentation consistently

SHOULD FIX:
5. Use async Redis client
6. Add negative test cases
7. Add audit logging

🎯 Verdict

⚠️ REQUEST CHANGES - Critical issues found

Estimated effort: 4-6 hours to fix all critical issues

Copy link
Copy Markdown
Collaborator

@msureshkumar88 msureshkumar88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in comments

Closes #4581.

Sources the rate-limiter plugin's redis_url from the gateway's REDIS_URL
env var via the plugin loader's existing Jinja substitution, instead of
hardcoding it as a literal string in plugins/config.yaml.

Why
---

The gateway already sources its own Redis URL from REDIS_URL via
pydantic Settings (mcpgateway/config.py). Operators set REDIS_URL once
and the gateway picks it up.  The rate-limiter plugin previously
ignored that env var and forced a second, separate edit if operators
wanted to point the limiter at a different Redis (different host,
different DB number, different deployment topology, etc.). One env var
mapping to two configuration surfaces is mildly surprising and creates
drift over time.

Backwards-compatible: docker-compose.yml exports
REDIS_URL=redis://redis:6379/0 by default, so the resolved value
matches the prior literal in local dev.  Operators who currently set
REDIS_URL in their deployment pick up the new convention transparently.

Test
----

Adds tests/integration/test_rate_limiter_redis_url_from_yaml.py — a
small smoke test that loads plugins/config.yaml, asserts the redis_url
resolves to a non-empty string with no leaked Jinja placeholder, and
PINGs a live Redis at the resolved URL.  Skips cleanly when Redis
isn't reachable, so the suite stays green on runners without Redis.

Out of scope

  * URL log sanitization or other observability changes.
  * Plugin-loader changes — Jinja substitution already supported.
  * Other plugin yaml entries — this PR is scoped to the rate-limiter
    block only.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…ng to survive yaml round-trip

The previous form ``"{{ env.REDIS_URL }}"`` rendered to an empty string
in any environment that did not set the ``REDIS_URL`` env var, including
the existing ``test_manager_initializes_packaged_plugins_from_shipped_configs``
unit test on CI.  The empty URL then hit the redis crate as a parse
failure (``InvalidClientConfig: Redis URL did not parse``).

Fix uses Jinja's ``default()`` filter to fall back to the previous
literal default when the env var is unset.  Operators in production
who set ``REDIS_URL`` get the override; CI / fresh-laptop environments
get the same value as before this PR.

Quoting note: YAML outer-single + Jinja inner-double, not the other
way around.  ``yaml.safe_dump`` (used by the existing test to filter
plugins and write a tmp config) doubles the inner ``'`` when it
round-trips a double-quoted YAML scalar, producing
``default(''redis://...'')`` which Jinja parses as empty-string + bare
identifier ``redis`` (the source of the CI failure).  Inverting the
quotes preserves the Jinja expression byte-for-byte across
``yaml.safe_load`` -> ``yaml.safe_dump`` and is the standard
recommendation for embedding Jinja in YAML scalars.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
@gandhipratik203 gandhipratik203 force-pushed the chore/rate-limiter-redis-config-hardening branch from a1292e1 to f34d7a1 Compare May 5, 2026 13:58
@gandhipratik203 gandhipratik203 changed the title chore(plugins): align rate-limiter config with gateway convention chore(plugins): align rate-limiter redis_url with gateway env-var convention May 5, 2026
@gandhipratik203
Copy link
Copy Markdown
Collaborator Author

gandhipratik203 commented May 5, 2026

Thanks for the review. Scope-narrowing this PR to just the redis_url Jinja substitution: the schema-validation regression test and redis_fallback removal that triggered most of these comments have moved to #4603, which catalogues the broader cleanup (docs, helm chart, perf configs, unit fixtures, and a misleadingly-named integration test). Starting-point branch: chore/rate-limiter-redis-fallback-cleanup-wip.

That moots 1 and 3 of your review (both lived in the code that's been moved out).

2 (SSRF on REDIS_URL): the test sets the env var itself via monkeypatch.setenv to localhost — there is no untrusted input crossing a trust boundary, so SSRF doesn't apply. Validating a value the test just wrote would be testing our own hardcoded string.

4–6: not applicable in the narrowed scope. A one-shot Redis PING doesn't need an async client; a positive-path test that env-var substitution flows through the loader cleanly doesn't need adversarial inputs; nothing mutates config at runtime here so there's no audit surface.

@gandhipratik203 gandhipratik203 changed the title chore(plugins): align rate-limiter redis_url with gateway env-var convention chore(plugins): align rate-limiter config with gatewayconvention May 5, 2026
@gandhipratik203 gandhipratik203 changed the title chore(plugins): align rate-limiter config with gatewayconvention chore(plugins): align rate-limiter config with gateway convention May 5, 2026
Copy link
Copy Markdown
Collaborator

@msureshkumar88 msureshkumar88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good direction — the Jinja | default(...) pattern matches what the rest of config.yaml already does for IBM_WATSON_*, and the outer-single/inner-double quoting to survive yaml.safe_dump round-trips is a nice touch. A couple of things worth addressing before this lands:

Blocking

Relative path in the integration test

cfg = ConfigLoader.load_config("plugins/config.yaml")   # test line 56

This resolves from cwd, so the test silently breaks whenever pytest is invoked from any directory other than the repo root (e.g. cd tests && pytest). Suggest anchoring to the file's own location:

import pathlib
_REPO_ROOT = pathlib.Path(__file__).resolve().parents[3]
cfg = ConfigLoader.load_config(str(_REPO_ROOT / "plugins" / "config.yaml"))

Security Scan CI failure

25/28 checks pass but the Security Scan job is failing. Worth checking whether Bandit is flagging something in the new test file (e.g. the bare except Exception in _redis_reachable, or the socket usage) and either suppressing with a # nosec + justification comment or adjusting the code. Happy to take a look at the job log if you can share it.

Suggestions

No test for the default fallback path

The test always injects REDIS_URL via monkeypatch.setenv, so the | default("redis://redis:6379/0") branch is never exercised. A small unit test (no live Redis needed) would pin that:

def test_rate_limiter_redis_url_uses_default_when_env_unset(monkeypatch, tmp_path):
    monkeypatch.delenv("REDIS_URL", raising=False)
    cfg = ConfigLoader.load_config("plugins/config.yaml")
    rl = next(p for p in cfg.plugins if p.name == "RateLimiterPlugin")
    assert rl.config.get("redis_url") == "redis://redis:6379/0"

(Same relative-path caveat applies here.)

Minor

The comment in config.yaml mentions WASM sandboxes — a bit surprising in a Redis URL context, since WASM plugins typically can't open TCP sockets. Trimming to just "separate container" would avoid confusion:

# Sourced from the gateway's REDIS_URL env; when the plugin runs in a
# separate container, ensure that host is reachable from within it.

One pre-existing issue worth tracking separately (not blocking this PR): SandboxedEnvironment in config.py:80 is constructed with autoescape=True, which HTML-encodes &, <, >, ". That's correct for HTML templates but wrong for YAML — if REDIS_URL ever includes a percent-encoded password or similar, the rendered value would be silently corrupted. Probably worth a follow-up issue since it affects all Jinja-templated values in the config, not just this one.

Address review feedback on #4582:

- ``test_rate_limiter_redis_url_from_yaml.py``: anchor
  ``plugins/config.yaml`` to the test file's location via
  ``pathlib.Path(__file__).resolve().parents[2]`` so the test works
  whether pytest is invoked from the repo root or from any
  subdirectory (``cd tests && pytest`` was silently broken).
- Add ``test_rate_limiter_redis_url_uses_default_when_env_unset``:
  the existing test always set ``REDIS_URL`` via
  ``monkeypatch.setenv``, so the ``| default("redis://redis:6379/0")``
  branch was untested.  The new test deletes the env var and
  asserts the default resolves — runs without a Redis service.
- ``plugins/config.yaml``: trim the ``redis_url`` comment to drop
  the WASM-sandbox reference (WASM plugins can't open TCP sockets,
  so ``REDIS_URL`` doesn't apply to them).  Keep the
  separate-container caveat which is the realistic case.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
gandhipratik203 added a commit that referenced this pull request May 5, 2026
Mirrors the same fix applied to PR #4582 so that when this branch is
eventually rebased onto main as part of #4603, the path-anchoring fix
is preserved in the renamed test file.

- ``test_plugins_config_yaml_validation.py``: anchor
  ``plugins/config.yaml`` to the test file's location via
  ``pathlib.Path(__file__).resolve().parents[2]`` so the tests work
  whether pytest is invoked from the repo root or a subdirectory.
  Applied to both the redis-url-resolves test and the schema-validation
  test (the latter also hardcoded the relative path).
- Add ``test_rate_limiter_redis_url_uses_default_when_env_unset``: the
  existing redis-url test always set ``REDIS_URL`` via
  ``monkeypatch.setenv``, so the ``| default("redis://redis:6379/0")``
  branch was untested.  The new test deletes the env var and asserts
  the default resolves; runs without a Redis service.
- ``plugins/config.yaml``: trim the ``redis_url`` comment to drop the
  WASM-sandbox reference.  Keep the separate-container caveat.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
@gandhipratik203
Copy link
Copy Markdown
Collaborator Author

Thanks — this was a great review. Addressed all four actionable items in aa5ded9; each backported to the WIP branch for #4603 in ee9436e so the path-anchor fix carries forward when that work picks up.

Relative path — anchored via pathlib.Path(__file__).resolve().parents[2]. Verified the test passes both from repo root and from tests/.

Security Scan failure — the job log says The job was not acquired by Runner of type hosted even after multiple attempts (queued 1h30m, abandoned). Infra, not code. The push to aa5ded9 re-triggered CI; the new Security Scan run is in progress.

Default fallback test — added test_rate_limiter_redis_url_uses_default_when_env_unset exactly as suggested. Runs without Redis.

WASM comment — trimmed to the separate-container case as suggested. You're right that WASM plugins can't open TCP sockets, so REDIS_URL doesn't apply to that path.

autoescape pre-existing bug — filed as #4605. Sharp catch; agree it's broader than this PR (affects every Jinja-templated value, not just redis_url) and warrants its own fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants