Avoid leaking extra-pattern matches in scrub reasons#2048
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a sensitive-data leak in Logfire’s scrubbing system where scrub markers could echo the matched secret substring when using ScrubbingOptions(extra_patterns=...), and adds regression coverage to prevent reintroducing the issue.
Changes:
- Compiles scrubbing regexes with per-pattern named groups to attribute a match to a specific configured pattern.
- Updates redaction “reason” generation to use the configured
extra_patternsregex string (instead of the matched substring) while preserving existing behavior for default patterns. - Adds a regression test for URL-credential scrubbing to ensure the scrub marker and
logfire.scrubbedmetadata do not contain the secret.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
logfire/_internal/scrubbing.py |
Changes scrubber regex compilation and redaction reason selection to avoid leaking matched secrets for extra_patterns. |
tests/test_secret_scrubbing.py |
Adds a regression test ensuring scrub reasons don’t echo URL credentials matched via extra_patterns. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| matched_substring = match.pattern_match.group(0) | ||
| self.scrubbed.append(ScrubbedNote(path=match.path, matched_substring=matched_substring)) | ||
| return f'[Scrubbed due to {matched_substring!r}]' | ||
| reason = self._pattern_reason_by_group.get(match.pattern_match.lastgroup or '', matched_substring) or matched_substring | ||
| self.scrubbed.append(ScrubbedNote(path=match.path, matched_substring=reason)) | ||
| return f'[Scrubbed due to {reason!r}]' |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
📝 WalkthroughWalkthroughThe scrubbing regex compilation in `logfire/_internal/scrubbing.py` now wraps each configured pattern in a uniquely named capture group and builds a mapping from group name to either `None` (default patterns) or the originating pattern string (extra patterns). `SpanScrubber` copies this mapping. The `_redact` function now derives the stored `ScrubbedNote` reason from the matched group via this mapping instead of always using the raw matched substring, preventing extra-pattern redaction markers from echoing sensitive matched text. A new test verifies this behavior for a PostgreSQL connection URL pattern. Changes
Sequence Diagram(s)sequenceDiagram
participant Span as SpanScrubber
participant Redact as _redact
participant Map as _pattern_reason_by_group
Span->>Redact: regex match with lastgroup
Redact->>Map: lookup reason by group name
Map-->>Redact: None (default) or pattern string (extra)
Redact->>Redact: choose matched_substring or pattern string as reason
Redact-->>Span: ScrubbedNote(reason) recorded safely
Related issues: Suggested labels: bug, security, scrubbing Suggested reviewers: alexmojaki, Kludex Poem: 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_secret_scrubbing.py (1)
340-358: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winPrefer the inline_snapshot pattern for span assertions.
The rest of this module (e.g.
test_scrubbing_config) asserts viaexporter.exported_spans_as_dict(...) == snapshot(...). This test hand-picks attributes instead. Keep thesecret not in ...guards (they document intent well), but add asnapshot()assertion so drift in the full span is caught.As per coding guidelines: "Tests that create spans should use TestExporter and inline_snapshot with the pattern... assert with exporter.exported_spans_as_dict(parse_json_attributes=True) == snapshot()".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_secret_scrubbing.py` around lines 340 - 358, The span assertion in test_extra_pattern_redaction_reason_does_not_echo_secret only checks selected attributes, so it can miss unrelated drift in the emitted span. Keep the existing secret-not-in guards, but update the assertion to use exporter.exported_spans_as_dict(parse_json_attributes=True) with inline snapshot() like the other tests in this module (for example test_scrubbing_config) so the full span shape is verified while preserving the redaction checks.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_secret_scrubbing.py`:
- Around line 340-358: The span assertion in
test_extra_pattern_redaction_reason_does_not_echo_secret only checks selected
attributes, so it can miss unrelated drift in the emitted span. Keep the
existing secret-not-in guards, but update the assertion to use
exporter.exported_spans_as_dict(parse_json_attributes=True) with inline
snapshot() like the other tests in this module (for example
test_scrubbing_config) so the full span shape is verified while preserving the
redaction checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 0b20ed9f-2f40-4786-aafa-beca62e01f17
📒 Files selected for processing (2)
logfire/_internal/scrubbing.pytests/test_secret_scrubbing.py
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
pydantic/pydantic(auto-detected)pydantic/pydantic-ai(auto-detected)
Summary
extra_patternsTesting
Fixes #1909