Skip to content

Avoid leaking extra-pattern matches in scrub reasons#2048

Open
Whning0513 wants to merge 2 commits into
pydantic:mainfrom
Whning0513:fix-extra-pattern-scrub-reason-1909
Open

Avoid leaking extra-pattern matches in scrub reasons#2048
Whning0513 wants to merge 2 commits into
pydantic:mainfrom
Whning0513:fix-extra-pattern-scrub-reason-1909

Conversation

@Whning0513

@Whning0513 Whning0513 commented Jul 1, 2026

Copy link
Copy Markdown

Summary

  • compile scrub patterns with per-pattern groups so we can identify which configured pattern matched
  • keep existing default scrub reasons, but use the configured extra pattern string instead of the matched secret substring
  • add a regression test covering URL credential scrubbing via extra_patterns

Testing

  • python -m pytest -q tests/test_secret_scrubbing.py -k extra_pattern_redaction_reason_does_not_echo_secret
  • python -m pytest -q tests/test_secret_scrubbing.py
  • python -m pytest -q tests/test_print.py -k instrument_print

Fixes #1909

Review in cubic

Copilot AI review requested due to automatic review settings July 1, 2026 13:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a sensitive-data leak in Logfire’s scrubbing system where scrub markers could echo the matched secret substring when using ScrubbingOptions(extra_patterns=...), and adds regression coverage to prevent reintroducing the issue.

Changes:

  • Compiles scrubbing regexes with per-pattern named groups to attribute a match to a specific configured pattern.
  • Updates redaction “reason” generation to use the configured extra_patterns regex string (instead of the matched substring) while preserving existing behavior for default patterns.
  • Adds a regression test for URL-credential scrubbing to ensure the scrub marker and logfire.scrubbed metadata do not contain the secret.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
logfire/_internal/scrubbing.py Changes scrubber regex compilation and redaction reason selection to avoid leaking matched secrets for extra_patterns.
tests/test_secret_scrubbing.py Adds a regression test ensuring scrub reasons don’t echo URL credentials matched via extra_patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 355 to +358
matched_substring = match.pattern_match.group(0)
self.scrubbed.append(ScrubbedNote(path=match.path, matched_substring=matched_substring))
return f'[Scrubbed due to {matched_substring!r}]'
reason = self._pattern_reason_by_group.get(match.pattern_match.lastgroup or '', matched_substring) or matched_substring
self.scrubbed.append(ScrubbedNote(path=match.path, matched_substring=reason))
return f'[Scrubbed due to {reason!r}]'

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@hramezani hramezani requested a review from alexmojaki July 2, 2026 07:36
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The scrubbing regex compilation in `logfire/_internal/scrubbing.py` now wraps each configured pattern in a uniquely named capture group and builds a mapping from group name to either `None` (default patterns) or the originating pattern string (extra patterns). `SpanScrubber` copies this mapping. The `_redact` function now derives the stored `ScrubbedNote` reason from the matched group via this mapping instead of always using the raw matched substring, preventing extra-pattern redaction markers from echoing sensitive matched text. A new test verifies this behavior for a PostgreSQL connection URL pattern.

Changes

Area Change
`logfire/_internal/scrubbing.py` Named capture groups per pattern; reason lookup map; `_redact` uses reason instead of raw matched text for extra patterns
`tests/test_secret_scrubbing.py` New test asserting scrub reason does not leak the matched secret for `extra_patterns`

Sequence Diagram(s)

sequenceDiagram
  participant Span as SpanScrubber
  participant Redact as _redact
  participant Map as _pattern_reason_by_group
  Span->>Redact: regex match with lastgroup
  Redact->>Map: lookup reason by group name
  Map-->>Redact: None (default) or pattern string (extra)
  Redact->>Redact: choose matched_substring or pattern string as reason
  Redact-->>Span: ScrubbedNote(reason) recorded safely
Loading

Related issues: #1909 — fixes the scrub-message leak where `extra_patterns` matches exposed the matched credential substring in the `[Scrubbed due to '...']` marker.

Suggested labels: bug, security, scrubbing

Suggested reviewers: alexmojaki, Kludex

Poem:
A rabbit hopped through regex dens,
Named each group with careful pens,
No more secrets in the reason shown,
Just patterns marked, the leak now gone,
Hop, hop, hooray — the burrow's safe again! 🐇🔒

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main fix: preventing scrub reasons from leaking extra-pattern matches.
Description check ✅ Passed The description matches the implemented changes and regression test for extra_patterns scrubbing.
Linked Issues check ✅ Passed The change removes matched secret text from extra-pattern scrub reasons, which addresses issue #1909.
Out of Scope Changes check ✅ Passed The diff stays focused on scrubbing logic and a regression test, with no obvious unrelated changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_secret_scrubbing.py (1)

340-358: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Prefer the inline_snapshot pattern for span assertions.

The rest of this module (e.g. test_scrubbing_config) asserts via exporter.exported_spans_as_dict(...) == snapshot(...). This test hand-picks attributes instead. Keep the secret not in ... guards (they document intent well), but add a snapshot() assertion so drift in the full span is caught.

As per coding guidelines: "Tests that create spans should use TestExporter and inline_snapshot with the pattern... assert with exporter.exported_spans_as_dict(parse_json_attributes=True) == snapshot()".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_secret_scrubbing.py` around lines 340 - 358, The span assertion in
test_extra_pattern_redaction_reason_does_not_echo_secret only checks selected
attributes, so it can miss unrelated drift in the emitted span. Keep the
existing secret-not-in guards, but update the assertion to use
exporter.exported_spans_as_dict(parse_json_attributes=True) with inline
snapshot() like the other tests in this module (for example
test_scrubbing_config) so the full span shape is verified while preserving the
redaction checks.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_secret_scrubbing.py`:
- Around line 340-358: The span assertion in
test_extra_pattern_redaction_reason_does_not_echo_secret only checks selected
attributes, so it can miss unrelated drift in the emitted span. Keep the
existing secret-not-in guards, but update the assertion to use
exporter.exported_spans_as_dict(parse_json_attributes=True) with inline
snapshot() like the other tests in this module (for example
test_scrubbing_config) so the full span shape is verified while preserving the
redaction checks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 0b20ed9f-2f40-4786-aafa-beca62e01f17

📥 Commits

Reviewing files that changed from the base of the PR and between ef5c776 and 5560f63.

📒 Files selected for processing (2)
  • logfire/_internal/scrubbing.py
  • tests/test_secret_scrubbing.py
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

  • pydantic/pydantic (auto-detected)
  • pydantic/pydantic-ai (auto-detected)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ScrubbingOptions extra_patterns: scrub-message leaks the matched credential substring

2 participants