perf: cache compiled regexes in TriggerMatcher#109
Conversation
Add bounded (500-entry) module-level cache for compiled RegExp objects. Eliminates ~10,000 redundant regex compilations per session when checking 1000 messages against 10 trigger patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant performance optimization by implementing a caching mechanism for compiled regular expressions. By storing and reusing RegExp objects, it aims to reduce the overhead associated with recompiling patterns on every invocation of matching functions, thereby improving the efficiency of pattern-based operations within the TriggerMatcher service. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request aims to improve performance by caching compiled regular expressions in TriggerMatcher using a Map as a bounded cache. However, the current implementation is vulnerable to memory exhaustion, potentially leading to a Denial of Service (DoS), as it caches invalid patterns of arbitrary length. Additionally, the caching strategy could be refined from FIFO to a true LRU policy for better effectiveness.
| function getCachedRegex(pattern: string, flags: string): RegExp | null { | ||
| const key = `${pattern}\0${flags}`; | ||
| if (regexCache.has(key)) { | ||
| return regexCache.get(key) ?? null; | ||
| } | ||
|
|
||
| // Evict oldest entries when cache is full | ||
| if (regexCache.size >= MAX_CACHE_SIZE) { | ||
| const firstKey = regexCache.keys().next().value; | ||
| if (firstKey !== undefined) { | ||
| regexCache.delete(firstKey); | ||
| } | ||
| } | ||
|
|
||
| const regex = createSafeRegExp(pattern, flags); | ||
| regexCache.set(key, regex); | ||
| return regex; | ||
| } |
There was a problem hiding this comment.
The getCachedRegex function is vulnerable to Denial of Service (DoS) via memory exhaustion. It creates cache keys from arbitrarily long pattern strings before validating their length. Even if createSafeRegExp rejects long patterns, the long string is cached as a key with a null value, allowing an attacker to fill the cache with large keys and consume significant memory. A length check should be performed before caching. Furthermore, the current cache implementation acts as a FIFO queue; a true LRU (Least Recently Used) strategy would be more effective for retaining frequently used items.
function getCachedRegex(pattern: string, flags: string): RegExp | null {
if (pattern.length > 100) {
return null;
}
const key = `${pattern}\0${flags}`;
if (regexCache.has(key)) {
return regexCache.get(key) ?? null;
}
// Evict oldest entries when cache is full
if (regexCache.size >= MAX_CACHE_SIZE) {
const firstKey = regexCache.keys().next().value;
if (firstKey !== undefined) {
regexCache.delete(firstKey);
}
}
const regex = createSafeRegExp(pattern, flags);
regexCache.set(key, regex);
return regex;
}
📝 WalkthroughWalkthroughIntroduces a module-level RegExp cache in TriggerMatcher.ts with a bounded size of 500 entries. A getCachedRegex helper function compiles and caches RegExp objects, evicting the oldest entry when full. The matchesPattern and matchesIgnorePatterns methods are updated to utilize this cache instead of directly calling createSafeRegExp. Changes
Suggested labels
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/main/services/error/TriggerMatcher.ts (1)
34-42: Implement true LRU eviction by refreshing cache hits.The current implementation evicts the oldest inserted entry, not the least recently used one.
Map.prototype.get()does not change iteration order, so frequently accessed patterns will still be evicted under heavy workload churn with >500 distinct patterns. To implement LRU with aMap, delete and re-insert the key on cache hits.♻️ Suggested change
function getCachedRegex(pattern: string, flags: string): RegExp | null { const key = `${pattern}\0${flags}`; if (regexCache.has(key)) { - return regexCache.get(key) ?? null; + const cached = regexCache.get(key) ?? null; + regexCache.delete(key); + regexCache.set(key, cached); + return cached; } // Evict oldest entries when cache is full if (regexCache.size >= MAX_CACHE_SIZE) { const firstKey = regexCache.keys().next().value; if (firstKey !== undefined) { regexCache.delete(firstKey);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/services/error/TriggerMatcher.ts` around lines 34 - 42, The regexCache currently treats hits as accesses but doesn't update order, so implement true LRU in TriggerMatcher by, when a cache hit occurs (check regexCache.has(key) in the lookup logic), retrieve the value via regexCache.get(key), then refresh its recency by deleting and re-inserting the same key/value pair into regexCache (delete(key); set(key, value)) before returning it; keep the existing eviction logic using MAX_CACHE_SIZE to remove regexCache.keys().next().value when size exceeds the limit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/main/services/error/TriggerMatcher.ts`:
- Around line 34-42: The regexCache currently treats hits as accesses but
doesn't update order, so implement true LRU in TriggerMatcher by, when a cache
hit occurs (check regexCache.has(key) in the lookup logic), retrieve the value
via regexCache.get(key), then refresh its recency by deleting and re-inserting
the same key/value pair into regexCache (delete(key); set(key, value)) before
returning it; keep the existing eviction logic using MAX_CACHE_SIZE to remove
regexCache.keys().next().value when size exceeds the limit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3f30424b-40c7-41da-b19b-afe4c902b53e
📒 Files selected for processing (1)
src/main/services/error/TriggerMatcher.ts
Summary
RegExpobjects inTriggerMatchermatchesPattern()andmatchesIgnorePatterns()now usegetCachedRegex()instead of recompiling viacreateSafeRegExp()on every callAddresses #95
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit