Add proactive context window management to prevent AI API errors#708
Add proactive context window management to prevent AI API errors#7082witstudios wants to merge 3 commits intomasterfrom
Conversation
Two-pronged fix for AI_APICallError when conversations exceed model context windows (e.g. OpenRouter 400k token limit): **Prevention (proactive truncation)** - Move modelMessages conversion to after system prompt is built so we have accurate token budgets before calling streamText - Use determineMessagesToInclude() from ai-context-calculator to trim the oldest messages first, reserving 75% of the context window for input and leaving 25% headroom for output tokens and estimator inaccuracies - Log a warning when truncation occurs with before/after message counts - Add OpenRouter provider context windows to getContextWindowSize so common models (DeepSeek, Qwen, Llama, Mistral, Gemini, Claude) get accurate limits instead of the 200k fallback **Better errors (when context still exceeds after truncation)** - Detect context-length errors in the route error handler and return HTTP 413 with error key 'context_length_exceeded' instead of a generic 500 - Add isContextLengthError() helper to error-messages.ts that matches provider-specific phrasing (OpenRouter, Anthropic, OpenAI, etc.) - getAIErrorMessage() now returns a clear, actionable message for context errors: "The conversation is too long for this model's context window. Older messages have been trimmed to fit — try sending your message again." - Replace hardcoded error-message logic in SidebarChatTab with getAIErrorMessage() so all three chat surfaces handle errors consistently https://claude.ai/code/session_011dBcfJNRsawEMzZLb67Z7m
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
📝 WalkthroughWalkthroughAdds proactive context-length management to the AI chat route: computes model context window and token estimates, truncates messages to fit a 75% input budget, logs truncation metadata, and returns 413 context_length_exceeded for context-limit errors. Also extends provider-specific context-window mappings and centralizes UI error messaging. Changes
Sequence DiagramsequenceDiagram
participant Client
participant ChatAPI as "AI Chat API Route"
participant Calculator as "Context Calculator"
participant Estimator as "Token Estimator"
participant AIModel as "AI Model"
participant ErrorHandler as "Error Handler"
Client->>ChatAPI: POST /api/ai/chat (messages, systemPrompt, model, provider)
ChatAPI->>Calculator: getContextWindowSize(provider, model)
Calculator-->>ChatAPI: contextWindow
ChatAPI->>Estimator: estimateSystemPromptTokens(fullSystemPrompt)
Estimator-->>ChatAPI: systemPromptTokens
ChatAPI->>Estimator: estimateToolDefinitionTokens(tools)
Estimator-->>ChatAPI: toolTokens
ChatAPI->>ChatAPI: compute inputBudget (75% of contextWindow)
ChatAPI->>Estimator: determineMessagesToInclude(messages, inputBudget)
Estimator-->>ChatAPI: includedMessages (truncated if needed)
alt Truncation occurred
ChatAPI->>ChatAPI: log truncation metadata (counts, tokens, budgets)
end
ChatAPI->>AIModel: stream with modelMessages (includedMessages + system/tool defs)
alt Context length error from model
AIModel-->>ErrorHandler: 413 context-length error
ErrorHandler-->>Client: 413 response { key: "context_length_exceeded", message }
else Successful stream
AIModel-->>Client: streamed response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
packages/lib/src/monitoring/ai-context-calculator.ts (1)
220-243: OpenRouter GPT matching is narrower than the direct-provider GPT block.The OpenRouter section only matches
gpt-4oandgpt-4-turbo(line 228), while the direct OpenAI block (lines 141-165) also covers GPT-5.x variants. If a user routes a GPT-5 model through OpenRouter, it will fall to the 200k default instead of the actual 272k–400k window. Not a bug per se — 200k is a safe underestimate — but worth noting for future model additions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/lib/src/monitoring/ai-context-calculator.ts` around lines 220 - 243, Update the OpenRouter model-matching branch in ai-context-calculator.ts to include GPT-5 variants so routed GPT-5 models get the same token window as the direct-provider GPT logic: add checks against modelLower for strings like "gpt-5", "gpt-5o", "gpt-5-turbo" (and any project-specific GPT-5 naming you supported in the OpenAI block) and return the same token limit used for GPT-5 in the direct OpenAI branch (use the same numeric constant there, e.g., 272_000 or the exact constant used in the OpenAI block) instead of falling through to the 200_000 default; modify the if chain near the other GPT checks (the block referencing providerLower === 'openrouter' and modelLower) to include these new condition(s).apps/web/src/lib/ai/shared/error-messages.ts (1)
47-62:isRateLimitErroroverlaps with context-length phrases but is currently unused—consider adding defensive logic for future safety.While
isRateLimitErrorchecks for the"limit"substring (line 71), which overlaps with"token limit"fromisContextLengthError, this function is not currently called anywhere in the codebase. The code is safe today becausegetAIErrorMessagedoesn't useisRateLimitErrorand checks context length first. However, ifisRateLimitErroris ever used independently in the future, it would misclassify context-length errors. Adding a guard to exclude context-length errors would be defensive refactoring.♻️ Suggested improvement
export function isRateLimitError(errorMessage: string | undefined): boolean { if (!errorMessage) return false; + // Exclude context-length errors that also contain "limit" + if (isContextLengthError(errorMessage)) return false; return ( errorMessage.toLowerCase().includes('rate') || errorMessage.toLowerCase().includes('limit') ||🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/src/lib/ai/shared/error-messages.ts` around lines 47 - 62, isRateLimitError can misclassify context-length issues because it looks for generic "limit" substrings; update isRateLimitError to early-return false when isContextLengthError(errorMessage) is true (or otherwise explicitly exclude context-length phrases) so rate-limit detection never fires for context/window/token-limit errors — locate the isRateLimitError and isContextLengthError functions and add the defensive guard in isRateLimitError (or add an explicit negative-condition for tokens/context/window phrases) to ensure the two checks don't overlap.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/web/src/app/api/ai/chat/route.ts`:
- Around line 829-835: The token estimator (estimateTokens in
ai-context-calculator.ts) uses a fixed 4-chars-per-token heuristic which
undercounts CJK text; update estimateTokens to detect non-ASCII/CJK content
(e.g., check for Unicode ranges or presence of multibyte chars) and use a more
conservative ratio (e.g., 2 chars/token) or make the ratio configurable, then
recalculate calls that rely on it (estimateSystemPromptTokens,
estimateToolDefinitionTokens, and any callers in route.ts) so truncation
reserves enough budget for CJK; keep the existing fallback error handler but
prefer preventing rejections by selecting the conservative ratio when
non-ASCII/CJK is detected.
- Around line 1241-1250: The code currently returns raw provider error text via
errorMsg when isContextLengthError(...) triggers; replace that leak by returning
a sanitized constant or summarized detail (e.g., details: 'context length
exceeded') instead of the raw errorMsg, or call a small sanitizer function
(e.g., sanitizeProviderMessage(errorMsg)) before including it; update the
NextResponse.json call in the isContextLengthError branch to use the
sanitized/constant detail and ensure the user-facing message string "Older
messages have been trimmed to fit — try sending your message again" (from the
proactive truncation logic around the proactive truncation block) is not used if
truncation already ran and likely won’t help—either remove that retry suggestion
or make it conditional based on whether truncation actually occurred.
---
Nitpick comments:
In `@apps/web/src/lib/ai/shared/error-messages.ts`:
- Around line 47-62: isRateLimitError can misclassify context-length issues
because it looks for generic "limit" substrings; update isRateLimitError to
early-return false when isContextLengthError(errorMessage) is true (or otherwise
explicitly exclude context-length phrases) so rate-limit detection never fires
for context/window/token-limit errors — locate the isRateLimitError and
isContextLengthError functions and add the defensive guard in isRateLimitError
(or add an explicit negative-condition for tokens/context/window phrases) to
ensure the two checks don't overlap.
In `@packages/lib/src/monitoring/ai-context-calculator.ts`:
- Around line 220-243: Update the OpenRouter model-matching branch in
ai-context-calculator.ts to include GPT-5 variants so routed GPT-5 models get
the same token window as the direct-provider GPT logic: add checks against
modelLower for strings like "gpt-5", "gpt-5o", "gpt-5-turbo" (and any
project-specific GPT-5 naming you supported in the OpenAI block) and return the
same token limit used for GPT-5 in the direct OpenAI branch (use the same
numeric constant there, e.g., 272_000 or the exact constant used in the OpenAI
block) instead of falling through to the 200_000 default; modify the if chain
near the other GPT checks (the block referencing providerLower === 'openrouter'
and modelLower) to include these new condition(s).
- Fix TypeScript build error: cast includedMessages to UIMessage[] for convertToModelMessages compatibility - Improve CJK token estimation: use 2 chars/token when >20% non-ASCII content detected (vs 4 chars/token for Latin text) - Sanitize 413 error response: replace raw provider error with constant, make retry suggestion conditional on whether truncation already ran - Add isContextLengthError guard to isRateLimitError to prevent misclassification of context-length errors as rate-limit errors - Add GPT-5 variant matching to OpenRouter section of getContextWindowSize Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addressed all CodeRabbit review feedback in 8a5e5d5Actionable comments (fixed):
Nitpick comments (fixed):
Build fix:
|
The wasTruncated variable was declared inside the try block but referenced in the outer catch block for conditional error messaging. Hoist it to function-level scope so the 413 error response can correctly determine whether truncation already occurred. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
packages/lib/src/monitoring/ai-context-calculator.ts (1)
55-66: Consider consolidating token estimation logic into one shared implementation.
estimateTokenshere now differs frompackages/lib/src/monitoring/ai-monitoring.ts:404-407(still fixed 4 chars/token). Keeping two heuristics will drift truncation vs. monitoring metrics over time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/lib/src/monitoring/ai-context-calculator.ts` around lines 55 - 66, The repository now has two different token-estimation heuristics (estimateTokens in ai-context-calculator.ts and the fixed 4-chars/token logic in ai-monitoring.ts) which can cause drift; replace the duplicated logic by extracting a single shared utility (e.g., export a consolidateEstimateTokens function) and import it from both places so both monitoring and context truncation use the same algorithm (include the non-ASCII detection and charsPerToken decision currently in estimateTokens). Update references to call the shared function (estimateTokens / consolidateEstimateTokens) and remove the old inline heuristic in ai-monitoring.ts to ensure consistent token counting across the codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/web/src/app/api/ai/chat/route.ts`:
- Around line 1249-1252: The error message for the non-truncated branch
incorrectly implies older messages were trimmed; update the logic that builds
the error response (the object using wasTruncated, message, and details in
route.ts) so when wasTruncated is false the message does not say messages were
trimmed—instead return a clear statement like "The conversation is too long for
this model's context window. Please start a new conversation or shorten the
conversation" while preserving details: 'context_length_exceeded'.
- Around line 837-844: After calling determineMessagesToInclude, add a guard
that ensures the current/latest user turn was not dropped: find the last user
message in sanitizedMessages (e.g., lastUserIndex or by role === 'user') and
verify includedMessages is non-empty and contains that message (compare id or
index); if it doesn't, return a clear client error (400) or an explicit error
response indicating the user input exceeds the token budget instead of
proceeding. Update the logic around determineMessagesToInclude /
includedMessages / wasTruncated in route.ts to perform this check and
short-circuit with a helpful error when the latest user turn would be removed.
In `@packages/lib/src/monitoring/ai-context-calculator.ts`:
- Around line 225-255: The OpenRouter-specific model limit block is being
bypassed because earlier generic model-name branches match first; update those
earlier checks to exclude OpenRouter by adding a guard like providerLower !==
'openrouter' to the generic model-name conditionals (the branches that inspect
modelLower for 'gpt', 'claude', 'gemini', etc.) or alternatively move the
OpenRouter block earlier so it runs before the generic branches; locate the
providerLower and modelLower checks in ai-context-calculator.ts and adjust the
conditionals around the generic model checks or reposition the OpenRouter if
(providerLower === 'openrouter') block so OpenRouter-specific mappings are
applied correctly.
- Line 60: The regex /[^\x00-\x7F]/g used to compute nonAsciiCount violates the
linter; replace that regex-based approach with explicit code-point checking:
iterate over the string (e.g., using for...of or Array.from) and count
characters whose codePointAt(0) > 0x7F, then store that count in nonAsciiCount.
Update the expression that computes nonAsciiCount accordingly so it handles full
Unicode code points (surrogate pairs) and removes the problematic regex.
---
Nitpick comments:
In `@packages/lib/src/monitoring/ai-context-calculator.ts`:
- Around line 55-66: The repository now has two different token-estimation
heuristics (estimateTokens in ai-context-calculator.ts and the fixed
4-chars/token logic in ai-monitoring.ts) which can cause drift; replace the
duplicated logic by extracting a single shared utility (e.g., export a
consolidateEstimateTokens function) and import it from both places so both
monitoring and context truncation use the same algorithm (include the non-ASCII
detection and charsPerToken decision currently in estimateTokens). Update
references to call the shared function (estimateTokens /
consolidateEstimateTokens) and remove the old inline heuristic in
ai-monitoring.ts to ensure consistent token counting across the codebase.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
apps/web/src/app/api/ai/chat/route.tsapps/web/src/lib/ai/shared/error-messages.tspackages/lib/src/monitoring/ai-context-calculator.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/web/src/lib/ai/shared/error-messages.ts
| const truncationResult = determineMessagesToInclude( | ||
| sanitizedMessages, | ||
| inputBudget, | ||
| systemPromptTokens, | ||
| toolTokens | ||
| ); | ||
| const { includedMessages } = truncationResult; | ||
| wasTruncated = truncationResult.wasTruncated; |
There was a problem hiding this comment.
Guard against truncation removing the current user turn entirely.
If the latest message alone exceeds budget, includedMessages can be empty. The call then proceeds with no user prompt, producing an unrelated assistant response instead of a clear error.
✅ Suggested fix
const { includedMessages } = truncationResult;
wasTruncated = truncationResult.wasTruncated;
+
+ // If even the newest user turn cannot fit, fail fast with context-length error
+ if (includedMessages.length === 0) {
+ return NextResponse.json(
+ {
+ error: 'context_length_exceeded',
+ message: 'Your latest message is too large for this model context window. Please shorten it or start a new conversation.',
+ details: 'context_length_exceeded',
+ },
+ { status: 413 }
+ );
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/web/src/app/api/ai/chat/route.ts` around lines 837 - 844, After calling
determineMessagesToInclude, add a guard that ensures the current/latest user
turn was not dropped: find the last user message in sanitizedMessages (e.g.,
lastUserIndex or by role === 'user') and verify includedMessages is non-empty
and contains that message (compare id or index); if it doesn't, return a clear
client error (400) or an explicit error response indicating the user input
exceeds the token budget instead of proceeding. Update the logic around
determineMessagesToInclude / includedMessages / wasTruncated in route.ts to
perform this check and short-circuit with a helpful error when the latest user
turn would be removed.
| message: wasTruncated | ||
| ? 'The conversation still exceeds this model\'s context window even after trimming. Please start a new conversation.' | ||
| : 'The conversation is too long for this model\'s context window. Older messages have been trimmed — try sending your message again.', | ||
| details: 'context_length_exceeded', |
There was a problem hiding this comment.
The non-truncated error message is misleading.
When wasTruncated is false, the response currently says older messages “have been trimmed,” but no trimming occurred in that branch.
✅ Suggested wording update
- : 'The conversation is too long for this model\'s context window. Older messages have been trimmed — try sending your message again.',
+ : 'The conversation is too long for this model\'s context window. Please try again or start a new conversation.',📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| message: wasTruncated | |
| ? 'The conversation still exceeds this model\'s context window even after trimming. Please start a new conversation.' | |
| : 'The conversation is too long for this model\'s context window. Older messages have been trimmed — try sending your message again.', | |
| details: 'context_length_exceeded', | |
| message: wasTruncated | |
| ? 'The conversation still exceeds this model\'s context window even after trimming. Please start a new conversation.' | |
| : 'The conversation is too long for this model\'s context window. Please try again or start a new conversation.', | |
| details: 'context_length_exceeded', |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/web/src/app/api/ai/chat/route.ts` around lines 1249 - 1252, The error
message for the non-truncated branch incorrectly implies older messages were
trimmed; update the logic that builds the error response (the object using
wasTruncated, message, and details in route.ts) so when wasTruncated is false
the message does not say messages were trimmed—instead return a clear statement
like "The conversation is too long for this model's context window. Please start
a new conversation or shorten the conversation" while preserving details:
'context_length_exceeded'.
| return Math.ceil(text.length / 4); | ||
| // Detect non-ASCII heavy content (CJK, emoji, etc.) | ||
| // CJK Unified Ideographs, Hiragana, Katakana, Hangul, etc. | ||
| const nonAsciiCount = (text.match(/[^\x00-\x7F]/g) || []).length; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and read the content around line 60
fd "ai-context-calculator.ts" --type fRepository: 2witstudios/PageSpace
Length of output: 117
🏁 Script executed:
# Once we confirm the file exists, read it to verify the exact content at line 60
cat -n packages/lib/src/monitoring/ai-context-calculator.ts | sed -n '55,65p'Repository: 2witstudios/PageSpace
Length of output: 636
🏁 Script executed:
# Check for linter configuration in the packages/lib directory
find packages/lib -maxdepth 3 -name "*.json" -o -name "*.mjs" -o -name "biome.json" -o -name ".eslintrc*" | head -20Repository: 2witstudios/PageSpace
Length of output: 150
🏁 Script executed:
# Check if Biome is configured at the root or in packages
find . -maxdepth 2 -name "biome.json" -o -name "biome.jsonc"Repository: 2witstudios/PageSpace
Length of output: 77
🏁 Script executed:
# Read the biome.json configuration
cat biome.jsonRepository: 2witstudios/PageSpace
Length of output: 1839
🏁 Script executed:
# Check the suggested fix in the review comment - verify if the logic is equivalent
python3 << 'EOF'
# Test if the suggested fix produces the same results as the original
# Original: text.match(/[^\x00-\x7F]/g) matches all non-ASCII characters
# Suggested: reduce with codePointAt check for > 0x7f
test_strings = [
"hello", # All ASCII
"hello🌍", # ASCII + emoji
"你好", # Chinese characters
"café", # ASCII + accented char
"مرحبا", # Arabic
]
for test_str in test_strings:
# Simulate original regex behavior
original_count = len([c for c in test_str if ord(c) > 0x7f])
# Simulate suggested fix behavior
suggested_count = 0
for char in test_str:
code_point = ord(char)
if code_point > 0x7f:
suggested_count += 1
print(f"String: '{test_str}' | Original: {original_count} | Suggested: {suggested_count} | Match: {original_count == suggested_count}")
EOFRepository: 2witstudios/PageSpace
Length of output: 357
🏁 Script executed:
# Check if noControlCharactersInRegex is part of Biome's recommended rules
# and verify if it would be active with the current config
web_search "Biome noControlCharactersInRegex rule"Repository: 2witstudios/PageSpace
Length of output: 115
🏁 Script executed:
# Search for other similar patterns that might have the same issue
rg -n '\\x00|\\x[0-9A-Fa-f]{2}' --type=ts --type=tsx packages/Repository: 2witstudios/PageSpace
Length of output: 92
🏁 Script executed:
# Verify the suggested fix doesn't have any edge cases with surrogate pairs or multi-byte chars
python3 << 'EOF'
# Test edge cases with the suggested fix
test_cases = [
("hello", 0), # All ASCII
("café", 1), # Single accented character
("🌍", 1), # Single emoji (surrogate pair in JS)
("你好世界", 4), # Chinese characters
("hello🌍world", 1), # Mixed ASCII and emoji
("\x00\x01\x7f", 0), # Control characters at edge of range
("", 0), # Empty string
]
for test_str, expected in test_cases:
# Suggested fix logic
count = 0
for char in test_str:
code_point = ord(char)
if code_point > 0x7f:
count += 1
status = "✓" if count == expected else "✗"
print(f"{status} String: {repr(test_str)} | Expected: {expected} | Got: {count}")
EOFRepository: 2witstudios/PageSpace
Length of output: 355
🏁 Script executed:
# Search for control character patterns in the codebase using ts file type only
rg -n '\\x00|\\x[0-9A-Fa-f]{2}' --type=ts packages/Repository: 2witstudios/PageSpace
Length of output: 2413
🏁 Script executed:
# Verify Biome rule documentation is available and check if recommended includes the suspicious rule
python3 << 'EOF'
# Check Biome's default rule settings for noControlCharactersInRegex
# Based on the biome.json, the "suspicious" category doesn't explicitly disable the rule,
# which means it uses the recommended default
biome_config = {
"suspicious": {
"noArrayIndexKey": "off",
"noExplicitAny": "off",
"noImplicitAnyLet": "off",
"noDoubleEquals": "off",
"noGlobalIsNan": "off",
"noAssignInExpressions": "off"
# noControlCharactersInRegex is NOT listed, so it uses the default (recommended = true for suspicious)
}
}
print("Analysis of biome.json configuration:")
print("- Linter is enabled: true")
print("- 'recommended' is enabled globally for linter rules")
print("- 'suspicious' category does NOT explicitly disable 'noControlCharactersInRegex'")
print("- Therefore, the rule IS active by default")
print("\nThe regex pattern /[^\\x00-\\x7F]/ contains control character escapes,")
print("which would trigger the 'noControlCharactersInRegex' rule in Biome.")
EOFRepository: 2witstudios/PageSpace
Length of output: 443
Replace the regex pattern to resolve the noControlCharactersInRegex Biome linter violation.
The pattern /[^\x00-\x7F]/g at line 60 triggers Biome's suspicious/noControlCharactersInRegex rule, which is enabled by default. Use explicit code point checking instead:
Suggested fix
- const nonAsciiCount = (text.match(/[^\x00-\x7F]/g) || []).length;
+ const nonAsciiCount = [...text].reduce((count, char) => {
+ const codePoint = char.codePointAt(0) ?? 0;
+ return count + (codePoint > 0x7f ? 1 : 0);
+ }, 0);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const nonAsciiCount = (text.match(/[^\x00-\x7F]/g) || []).length; | |
| const nonAsciiCount = [...text].reduce((count, char) => { | |
| const codePoint = char.codePointAt(0) ?? 0; | |
| return count + (codePoint > 0x7f ? 1 : 0); | |
| }, 0); |
🧰 Tools
🪛 Biome (2.4.4)
[error] 60-60: Unexpected control character in a regular expression.
(lint/suspicious/noControlCharactersInRegex)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/lib/src/monitoring/ai-context-calculator.ts` at line 60, The regex
/[^\x00-\x7F]/g used to compute nonAsciiCount violates the linter; replace that
regex-based approach with explicit code-point checking: iterate over the string
(e.g., using for...of or Array.from) and count characters whose codePointAt(0) >
0x7F, then store that count in nonAsciiCount. Update the expression that
computes nonAsciiCount accordingly so it handles full Unicode code points
(surrogate pairs) and removes the problematic regex.
| // OpenRouter - use model-specific limits where known, else 200k conservative default | ||
| if (providerLower === 'openrouter') { | ||
| // Claude models via OpenRouter | ||
| if (modelLower.includes('claude')) return 200_000; | ||
| // Gemini models via OpenRouter | ||
| if (modelLower.includes('gemini-2.5')) return 1_000_000; | ||
| if (modelLower.includes('gemini-2.0') || modelLower.includes('gemini-1.5')) return 1_000_000; | ||
| // GPT models via OpenRouter | ||
| if (modelLower.includes('gpt-5.2')) { | ||
| return modelLower.includes('mini') || modelLower.includes('nano') ? 256_000 : 400_000; | ||
| } | ||
| if (modelLower.includes('gpt-5.1')) return 400_000; | ||
| if (modelLower.includes('gpt-5')) { | ||
| return modelLower.includes('mini') || modelLower.includes('nano') ? 128_000 : 272_000; | ||
| } | ||
| if (modelLower.includes('gpt-4o') || modelLower.includes('gpt-4-turbo')) return 128_000; | ||
| // DeepSeek models - commonly 64k or 128k | ||
| if (modelLower.includes('deepseek-r1') || modelLower.includes('deepseek-v3')) return 128_000; | ||
| if (modelLower.includes('deepseek')) return 64_000; | ||
| // Qwen models | ||
| if (modelLower.includes('qwen-2.5') || modelLower.includes('qwq')) return 128_000; | ||
| if (modelLower.includes('qwen')) return 32_000; | ||
| // Llama models | ||
| if (modelLower.includes('llama-3') || modelLower.includes('llama3')) return 128_000; | ||
| if (modelLower.includes('llama')) return 32_000; | ||
| // Mistral models | ||
| if (modelLower.includes('mistral-large') || modelLower.includes('mistral-nemo')) return 128_000; | ||
| if (modelLower.includes('mistral')) return 32_000; | ||
| // OpenRouter platform hard cap is 400k for many endpoints - use 200k as safe default | ||
| return 200_000; | ||
| } |
There was a problem hiding this comment.
OpenRouter-specific limits are partially shadowed by earlier branches.
For OpenRouter models, checks at Line 146 / Line 173 / Line 184 can match first via model name (gpt, claude, gemini), so this block won’t run for many of the models it intends to handle. That defeats the provider-specific conservative mapping.
✅ Suggested fix direction
export function getContextWindowSize(model: string, provider?: string): number {
const providerLower = provider?.toLowerCase() || '';
const modelLower = model.toLowerCase();
+ // OpenRouter first to ensure provider-specific limits are applied
+ if (providerLower === 'openrouter') {
+ // ...existing OpenRouter mapping...
+ return 200_000;
+ }
+
// OpenAI models
- if (providerLower === 'openai' || modelLower.includes('gpt')) {
+ if (providerLower === 'openai' || (!providerLower && modelLower.includes('gpt'))) {
...
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/lib/src/monitoring/ai-context-calculator.ts` around lines 225 - 255,
The OpenRouter-specific model limit block is being bypassed because earlier
generic model-name branches match first; update those earlier checks to exclude
OpenRouter by adding a guard like providerLower !== 'openrouter' to the generic
model-name conditionals (the branches that inspect modelLower for 'gpt',
'claude', 'gemini', etc.) or alternatively move the OpenRouter block earlier so
it runs before the generic branches; locate the providerLower and modelLower
checks in ai-context-calculator.ts and adjust the conditionals around the
generic model checks or reposition the OpenRouter if (providerLower ===
'openrouter') block so OpenRouter-specific mappings are applied correctly.
Reviews proactive context window management for AI chat, covering token estimation accuracy, OpenRouter model registry, error handling, type safety, and OWASP compliance. Recommends adding tests and verifying error propagation before merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix wasTruncated message inversion: non-truncated error no longer
claims messages were trimmed
- Guard empty messages: return 413 when truncation leaves zero messages
instead of sending empty context to streamText
- Fix estimateMessageTokens: match actual Vercel AI SDK tool-{name}
part format (input/output/state) instead of legacy tool-call/tool-result
- Reconcile context windows: extract MODEL_CONTEXT_WINDOWS into shared
model-context-windows.ts as single source of truth (fixes glm-5
200000→202752, minimax m2.5 via OpenRouter 1M→204800 divergence)
- Fix isContextLengthError: replace broad '413' string match with
specific HTTP status pattern to prevent false positives
- Fix estimateTokens: replace regex array allocation with efficient
charCode loop for CJK detection
- Add 42 unit tests for isContextLengthError, estimateTokens,
estimateMessageTokens, getContextWindowSize, determineMessagesToInclude
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix inverted wasTruncated message logic in catch block 413 response
- Guard against empty includedMessages after truncation (return 413 early)
- Handle tool-{toolName} part format in estimateMessageTokens (PageSpace DB format)
- Extract MODEL_CONTEXT_WINDOWS to shared model-context-windows.ts (single source of truth)
- Narrow isContextLengthError '413' check to require context/token keywords
- Replace regex with for-loop counter in estimateTokens for memory efficiency
- Add unit tests for isContextLengthError and CJK-aware estimateTokens
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
This PR implements proactive context window management for AI chat conversations to prevent token limit errors from AI providers. Instead of letting conversations fail when they exceed a model's context window, the system now intelligently truncates older messages before sending requests to the API.
Key Changes
Context-aware message truncation: Added logic to
POST /api/ai/chatthat calculates available token budget based on the model's context window, system prompt size, and tool definitions, then includes only the most recent messages that fit within that budget.Enhanced context window detection: Expanded
getContextWindowSize()inai-context-calculator.tswith model-specific context limits for OpenRouter providers (Claude, Gemini, GPT, DeepSeek, Qwen, Llama, Mistral models), replacing the previous generic 200k fallback.Context length error handling:
isContextLengthError()utility function to detect context/token limit errors from various AI providerscontext_length_exceedederror code when detectedgetAIErrorMessage()to provide user-friendly messaging for context length errorsImproved error messaging in UI: Refactored
SidebarChatTab.tsxto use centralizedgetAIErrorMessage()function instead of inline error detection logic, ensuring consistent error messaging across the application.Implementation Details
https://claude.ai/code/session_011dBcfJNRsawEMzZLb67Z7m
Summary by CodeRabbit
New Features
Bug Fixes
Refactor