Skip to content

Add proactive context window management to prevent AI API errors#708

Open
2witstudios wants to merge 3 commits intomasterfrom
claude/fix-context-length-errors-WFBfz
Open

Add proactive context window management to prevent AI API errors#708
2witstudios wants to merge 3 commits intomasterfrom
claude/fix-context-length-errors-WFBfz

Conversation

@2witstudios
Copy link
Owner

@2witstudios 2witstudios commented Feb 21, 2026

Summary

This PR implements proactive context window management for AI chat conversations to prevent token limit errors from AI providers. Instead of letting conversations fail when they exceed a model's context window, the system now intelligently truncates older messages before sending requests to the API.

Key Changes

  • Context-aware message truncation: Added logic to POST /api/ai/chat that calculates available token budget based on the model's context window, system prompt size, and tool definitions, then includes only the most recent messages that fit within that budget.

  • Enhanced context window detection: Expanded getContextWindowSize() in ai-context-calculator.ts with model-specific context limits for OpenRouter providers (Claude, Gemini, GPT, DeepSeek, Qwen, Llama, Mistral models), replacing the previous generic 200k fallback.

  • Context length error handling:

    • Added isContextLengthError() utility function to detect context/token limit errors from various AI providers
    • Updated error response to return HTTP 413 status with context_length_exceeded error code when detected
    • Enhanced getAIErrorMessage() to provide user-friendly messaging for context length errors
  • Improved error messaging in UI: Refactored SidebarChatTab.tsx to use centralized getAIErrorMessage() function instead of inline error detection logic, ensuring consistent error messaging across the application.

Implementation Details

  • Reserves 25% of the context window as headroom for output tokens and tokenizer inaccuracies
  • Logs truncation events with detailed metrics (original/included message counts, token budgets) for monitoring
  • Uses database-loaded messages (not client messages) for accurate context calculation
  • Converts messages to model format after truncation to ensure accurate token estimation

https://claude.ai/code/session_011dBcfJNRsawEMzZLb67Z7m

Summary by CodeRabbit

  • New Features

    • Chat now proactively trims conversation history to fit model limits, improving reliability for long sessions.
  • Bug Fixes

    • Better handling and clearer user-facing messages when a conversation exceeds context/token limits.
    • More consistent error messaging across the AI assistant.
  • Refactor

    • Centralized AI error messaging and improved token/context sizing logic for more accurate length checks.

Two-pronged fix for AI_APICallError when conversations exceed model
context windows (e.g. OpenRouter 400k token limit):

**Prevention (proactive truncation)**
- Move modelMessages conversion to after system prompt is built so we
  have accurate token budgets before calling streamText
- Use determineMessagesToInclude() from ai-context-calculator to trim
  the oldest messages first, reserving 75% of the context window for
  input and leaving 25% headroom for output tokens and estimator
  inaccuracies
- Log a warning when truncation occurs with before/after message counts
- Add OpenRouter provider context windows to getContextWindowSize so
  common models (DeepSeek, Qwen, Llama, Mistral, Gemini, Claude) get
  accurate limits instead of the 200k fallback

**Better errors (when context still exceeds after truncation)**
- Detect context-length errors in the route error handler and return
  HTTP 413 with error key 'context_length_exceeded' instead of a
  generic 500
- Add isContextLengthError() helper to error-messages.ts that matches
  provider-specific phrasing (OpenRouter, Anthropic, OpenAI, etc.)
- getAIErrorMessage() now returns a clear, actionable message for
  context errors: "The conversation is too long for this model's context
  window. Older messages have been trimmed to fit — try sending your
  message again."
- Replace hardcoded error-message logic in SidebarChatTab with
  getAIErrorMessage() so all three chat surfaces handle errors
  consistently

https://claude.ai/code/session_011dBcfJNRsawEMzZLb67Z7m
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

📝 Walkthrough

Walkthrough

Adds proactive context-length management to the AI chat route: computes model context window and token estimates, truncates messages to fit a 75% input budget, logs truncation metadata, and returns 413 context_length_exceeded for context-limit errors. Also extends provider-specific context-window mappings and centralizes UI error messaging.

Changes

Cohort / File(s) Summary
Chat API - Context Management
apps/web/src/app/api/ai/chat/route.ts
Compute contextWindow, estimate system/tool tokens, determine includedMessages with a 75% input budget, set wasTruncated, pass truncated messages to streaming call, log truncation metadata, and return 413 for context-length errors.
Error Utilities
apps/web/src/lib/ai/shared/error-messages.ts
Add isContextLengthError() to detect token/context-limit errors; update getAIErrorMessage() to surface context-length messages and exclude these from rate-limit detection.
Client UI Error Display
apps/web/src/components/layout/right-sidebar/ai-assistant/SidebarChatTab.tsx
Replace inline error matching with getAIErrorMessage(error.message) rendering to centralize error messaging.
Token & Context Calculator
packages/lib/src/monitoring/ai-context-calculator.ts
Revise estimateTokens() heuristic for non-ASCII content and expand getContextWindowSize() with provider/model-specific token limits (OpenRouter models mapping and conservative defaults).

Sequence Diagram

sequenceDiagram
    participant Client
    participant ChatAPI as "AI Chat API Route"
    participant Calculator as "Context Calculator"
    participant Estimator as "Token Estimator"
    participant AIModel as "AI Model"
    participant ErrorHandler as "Error Handler"

    Client->>ChatAPI: POST /api/ai/chat (messages, systemPrompt, model, provider)
    ChatAPI->>Calculator: getContextWindowSize(provider, model)
    Calculator-->>ChatAPI: contextWindow

    ChatAPI->>Estimator: estimateSystemPromptTokens(fullSystemPrompt)
    Estimator-->>ChatAPI: systemPromptTokens
    ChatAPI->>Estimator: estimateToolDefinitionTokens(tools)
    Estimator-->>ChatAPI: toolTokens

    ChatAPI->>ChatAPI: compute inputBudget (75% of contextWindow)
    ChatAPI->>Estimator: determineMessagesToInclude(messages, inputBudget)
    Estimator-->>ChatAPI: includedMessages (truncated if needed)
    alt Truncation occurred
        ChatAPI->>ChatAPI: log truncation metadata (counts, tokens, budgets)
    end

    ChatAPI->>AIModel: stream with modelMessages (includedMessages + system/tool defs)
    alt Context length error from model
        AIModel-->>ErrorHandler: 413 context-length error
        ErrorHandler-->>Client: 413 response { key: "context_length_exceeded", message }
    else Successful stream
        AIModel-->>Client: streamed response
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I counted tokens, nibbled lines of chat,

Trimmed the carrots — messages — to fit the hat.
If tokens overflow and tumblers rattle,
I send a 413 and tidy up the prattle.
Hop, hop — context snug; now bring on the next battle!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: proactive context window management to prevent AI API errors, which directly aligns with the primary objective of calculating context budgets and truncating messages.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/fix-context-length-errors-WFBfz

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
packages/lib/src/monitoring/ai-context-calculator.ts (1)

220-243: OpenRouter GPT matching is narrower than the direct-provider GPT block.

The OpenRouter section only matches gpt-4o and gpt-4-turbo (line 228), while the direct OpenAI block (lines 141-165) also covers GPT-5.x variants. If a user routes a GPT-5 model through OpenRouter, it will fall to the 200k default instead of the actual 272k–400k window. Not a bug per se — 200k is a safe underestimate — but worth noting for future model additions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/lib/src/monitoring/ai-context-calculator.ts` around lines 220 - 243,
Update the OpenRouter model-matching branch in ai-context-calculator.ts to
include GPT-5 variants so routed GPT-5 models get the same token window as the
direct-provider GPT logic: add checks against modelLower for strings like
"gpt-5", "gpt-5o", "gpt-5-turbo" (and any project-specific GPT-5 naming you
supported in the OpenAI block) and return the same token limit used for GPT-5 in
the direct OpenAI branch (use the same numeric constant there, e.g., 272_000 or
the exact constant used in the OpenAI block) instead of falling through to the
200_000 default; modify the if chain near the other GPT checks (the block
referencing providerLower === 'openrouter' and modelLower) to include these new
condition(s).
apps/web/src/lib/ai/shared/error-messages.ts (1)

47-62: isRateLimitError overlaps with context-length phrases but is currently unused—consider adding defensive logic for future safety.

While isRateLimitError checks for the "limit" substring (line 71), which overlaps with "token limit" from isContextLengthError, this function is not currently called anywhere in the codebase. The code is safe today because getAIErrorMessage doesn't use isRateLimitError and checks context length first. However, if isRateLimitError is ever used independently in the future, it would misclassify context-length errors. Adding a guard to exclude context-length errors would be defensive refactoring.

♻️ Suggested improvement
 export function isRateLimitError(errorMessage: string | undefined): boolean {
   if (!errorMessage) return false;
+  // Exclude context-length errors that also contain "limit"
+  if (isContextLengthError(errorMessage)) return false;
   return (
     errorMessage.toLowerCase().includes('rate') ||
     errorMessage.toLowerCase().includes('limit') ||
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/lib/ai/shared/error-messages.ts` around lines 47 - 62,
isRateLimitError can misclassify context-length issues because it looks for
generic "limit" substrings; update isRateLimitError to early-return false when
isContextLengthError(errorMessage) is true (or otherwise explicitly exclude
context-length phrases) so rate-limit detection never fires for
context/window/token-limit errors — locate the isRateLimitError and
isContextLengthError functions and add the defensive guard in isRateLimitError
(or add an explicit negative-condition for tokens/context/window phrases) to
ensure the two checks don't overlap.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/web/src/app/api/ai/chat/route.ts`:
- Around line 829-835: The token estimator (estimateTokens in
ai-context-calculator.ts) uses a fixed 4-chars-per-token heuristic which
undercounts CJK text; update estimateTokens to detect non-ASCII/CJK content
(e.g., check for Unicode ranges or presence of multibyte chars) and use a more
conservative ratio (e.g., 2 chars/token) or make the ratio configurable, then
recalculate calls that rely on it (estimateSystemPromptTokens,
estimateToolDefinitionTokens, and any callers in route.ts) so truncation
reserves enough budget for CJK; keep the existing fallback error handler but
prefer preventing rejections by selecting the conservative ratio when
non-ASCII/CJK is detected.
- Around line 1241-1250: The code currently returns raw provider error text via
errorMsg when isContextLengthError(...) triggers; replace that leak by returning
a sanitized constant or summarized detail (e.g., details: 'context length
exceeded') instead of the raw errorMsg, or call a small sanitizer function
(e.g., sanitizeProviderMessage(errorMsg)) before including it; update the
NextResponse.json call in the isContextLengthError branch to use the
sanitized/constant detail and ensure the user-facing message string "Older
messages have been trimmed to fit — try sending your message again" (from the
proactive truncation logic around the proactive truncation block) is not used if
truncation already ran and likely won’t help—either remove that retry suggestion
or make it conditional based on whether truncation actually occurred.

---

Nitpick comments:
In `@apps/web/src/lib/ai/shared/error-messages.ts`:
- Around line 47-62: isRateLimitError can misclassify context-length issues
because it looks for generic "limit" substrings; update isRateLimitError to
early-return false when isContextLengthError(errorMessage) is true (or otherwise
explicitly exclude context-length phrases) so rate-limit detection never fires
for context/window/token-limit errors — locate the isRateLimitError and
isContextLengthError functions and add the defensive guard in isRateLimitError
(or add an explicit negative-condition for tokens/context/window phrases) to
ensure the two checks don't overlap.

In `@packages/lib/src/monitoring/ai-context-calculator.ts`:
- Around line 220-243: Update the OpenRouter model-matching branch in
ai-context-calculator.ts to include GPT-5 variants so routed GPT-5 models get
the same token window as the direct-provider GPT logic: add checks against
modelLower for strings like "gpt-5", "gpt-5o", "gpt-5-turbo" (and any
project-specific GPT-5 naming you supported in the OpenAI block) and return the
same token limit used for GPT-5 in the direct OpenAI branch (use the same
numeric constant there, e.g., 272_000 or the exact constant used in the OpenAI
block) instead of falling through to the 200_000 default; modify the if chain
near the other GPT checks (the block referencing providerLower === 'openrouter'
and modelLower) to include these new condition(s).

- Fix TypeScript build error: cast includedMessages to UIMessage[] for
  convertToModelMessages compatibility
- Improve CJK token estimation: use 2 chars/token when >20% non-ASCII
  content detected (vs 4 chars/token for Latin text)
- Sanitize 413 error response: replace raw provider error with constant,
  make retry suggestion conditional on whether truncation already ran
- Add isContextLengthError guard to isRateLimitError to prevent
  misclassification of context-length errors as rate-limit errors
- Add GPT-5 variant matching to OpenRouter section of getContextWindowSize

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@2witstudios
Copy link
Owner Author

Addressed all CodeRabbit review feedback in 8a5e5d5

Actionable comments (fixed):

  1. CJK token estimation (ai-context-calculator.ts): estimateTokens() now detects non-ASCII-heavy text and uses a 2 chars/token ratio (vs 4 for Latin), preventing undercounting for CJK workloads.
  2. Raw provider error leak (route.ts): 413 response now returns a sanitized details: 'context_length_exceeded' constant. The user-facing message is conditional — suggests starting a new conversation if truncation already ran (retry would likely fail), or suggests retrying if truncation hasn't been applied yet.

Nitpick comments (fixed):

  1. isRateLimitError overlap (error-messages.ts): Added isContextLengthError() guard so context-length errors containing "limit" aren't misclassified as rate-limit errors.
  2. OpenRouter GPT-5 matching (ai-context-calculator.ts): Added GPT-5.x variant matching to the OpenRouter section, mirroring the direct-provider GPT block (5.0/5.1/5.2 with mini/nano variants).

Build fix:

  1. TypeScript build error (route.ts:856): Cast includedMessages to SDK UIMessage[] to resolve type mismatch between the calculator's looser UIMessage interface and Vercel AI SDK's convertToModelMessages signature.

The wasTruncated variable was declared inside the try block but
referenced in the outer catch block for conditional error messaging.
Hoist it to function-level scope so the 413 error response can
correctly determine whether truncation already occurred.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
packages/lib/src/monitoring/ai-context-calculator.ts (1)

55-66: Consider consolidating token estimation logic into one shared implementation.

estimateTokens here now differs from packages/lib/src/monitoring/ai-monitoring.ts:404-407 (still fixed 4 chars/token). Keeping two heuristics will drift truncation vs. monitoring metrics over time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/lib/src/monitoring/ai-context-calculator.ts` around lines 55 - 66,
The repository now has two different token-estimation heuristics (estimateTokens
in ai-context-calculator.ts and the fixed 4-chars/token logic in
ai-monitoring.ts) which can cause drift; replace the duplicated logic by
extracting a single shared utility (e.g., export a consolidateEstimateTokens
function) and import it from both places so both monitoring and context
truncation use the same algorithm (include the non-ASCII detection and
charsPerToken decision currently in estimateTokens). Update references to call
the shared function (estimateTokens / consolidateEstimateTokens) and remove the
old inline heuristic in ai-monitoring.ts to ensure consistent token counting
across the codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/web/src/app/api/ai/chat/route.ts`:
- Around line 1249-1252: The error message for the non-truncated branch
incorrectly implies older messages were trimmed; update the logic that builds
the error response (the object using wasTruncated, message, and details in
route.ts) so when wasTruncated is false the message does not say messages were
trimmed—instead return a clear statement like "The conversation is too long for
this model's context window. Please start a new conversation or shorten the
conversation" while preserving details: 'context_length_exceeded'.
- Around line 837-844: After calling determineMessagesToInclude, add a guard
that ensures the current/latest user turn was not dropped: find the last user
message in sanitizedMessages (e.g., lastUserIndex or by role === 'user') and
verify includedMessages is non-empty and contains that message (compare id or
index); if it doesn't, return a clear client error (400) or an explicit error
response indicating the user input exceeds the token budget instead of
proceeding. Update the logic around determineMessagesToInclude /
includedMessages / wasTruncated in route.ts to perform this check and
short-circuit with a helpful error when the latest user turn would be removed.

In `@packages/lib/src/monitoring/ai-context-calculator.ts`:
- Around line 225-255: The OpenRouter-specific model limit block is being
bypassed because earlier generic model-name branches match first; update those
earlier checks to exclude OpenRouter by adding a guard like providerLower !==
'openrouter' to the generic model-name conditionals (the branches that inspect
modelLower for 'gpt', 'claude', 'gemini', etc.) or alternatively move the
OpenRouter block earlier so it runs before the generic branches; locate the
providerLower and modelLower checks in ai-context-calculator.ts and adjust the
conditionals around the generic model checks or reposition the OpenRouter if
(providerLower === 'openrouter') block so OpenRouter-specific mappings are
applied correctly.
- Line 60: The regex /[^\x00-\x7F]/g used to compute nonAsciiCount violates the
linter; replace that regex-based approach with explicit code-point checking:
iterate over the string (e.g., using for...of or Array.from) and count
characters whose codePointAt(0) > 0x7F, then store that count in nonAsciiCount.
Update the expression that computes nonAsciiCount accordingly so it handles full
Unicode code points (surrogate pairs) and removes the problematic regex.

---

Nitpick comments:
In `@packages/lib/src/monitoring/ai-context-calculator.ts`:
- Around line 55-66: The repository now has two different token-estimation
heuristics (estimateTokens in ai-context-calculator.ts and the fixed
4-chars/token logic in ai-monitoring.ts) which can cause drift; replace the
duplicated logic by extracting a single shared utility (e.g., export a
consolidateEstimateTokens function) and import it from both places so both
monitoring and context truncation use the same algorithm (include the non-ASCII
detection and charsPerToken decision currently in estimateTokens). Update
references to call the shared function (estimateTokens /
consolidateEstimateTokens) and remove the old inline heuristic in
ai-monitoring.ts to ensure consistent token counting across the codebase.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86c2009 and 72fb8d9.

📒 Files selected for processing (3)
  • apps/web/src/app/api/ai/chat/route.ts
  • apps/web/src/lib/ai/shared/error-messages.ts
  • packages/lib/src/monitoring/ai-context-calculator.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/web/src/lib/ai/shared/error-messages.ts

Comment on lines +837 to +844
const truncationResult = determineMessagesToInclude(
sanitizedMessages,
inputBudget,
systemPromptTokens,
toolTokens
);
const { includedMessages } = truncationResult;
wasTruncated = truncationResult.wasTruncated;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against truncation removing the current user turn entirely.

If the latest message alone exceeds budget, includedMessages can be empty. The call then proceeds with no user prompt, producing an unrelated assistant response instead of a clear error.

✅ Suggested fix
     const { includedMessages } = truncationResult;
     wasTruncated = truncationResult.wasTruncated;
+
+    // If even the newest user turn cannot fit, fail fast with context-length error
+    if (includedMessages.length === 0) {
+      return NextResponse.json(
+        {
+          error: 'context_length_exceeded',
+          message: 'Your latest message is too large for this model context window. Please shorten it or start a new conversation.',
+          details: 'context_length_exceeded',
+        },
+        { status: 413 }
+      );
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/app/api/ai/chat/route.ts` around lines 837 - 844, After calling
determineMessagesToInclude, add a guard that ensures the current/latest user
turn was not dropped: find the last user message in sanitizedMessages (e.g.,
lastUserIndex or by role === 'user') and verify includedMessages is non-empty
and contains that message (compare id or index); if it doesn't, return a clear
client error (400) or an explicit error response indicating the user input
exceeds the token budget instead of proceeding. Update the logic around
determineMessagesToInclude / includedMessages / wasTruncated in route.ts to
perform this check and short-circuit with a helpful error when the latest user
turn would be removed.

Comment on lines +1249 to +1252
message: wasTruncated
? 'The conversation still exceeds this model\'s context window even after trimming. Please start a new conversation.'
: 'The conversation is too long for this model\'s context window. Older messages have been trimmed — try sending your message again.',
details: 'context_length_exceeded',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The non-truncated error message is misleading.

When wasTruncated is false, the response currently says older messages “have been trimmed,” but no trimming occurred in that branch.

✅ Suggested wording update
-            : 'The conversation is too long for this model\'s context window. Older messages have been trimmed — try sending your message again.',
+            : 'The conversation is too long for this model\'s context window. Please try again or start a new conversation.',
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
message: wasTruncated
? 'The conversation still exceeds this model\'s context window even after trimming. Please start a new conversation.'
: 'The conversation is too long for this model\'s context window. Older messages have been trimmed — try sending your message again.',
details: 'context_length_exceeded',
message: wasTruncated
? 'The conversation still exceeds this model\'s context window even after trimming. Please start a new conversation.'
: 'The conversation is too long for this model\'s context window. Please try again or start a new conversation.',
details: 'context_length_exceeded',
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/app/api/ai/chat/route.ts` around lines 1249 - 1252, The error
message for the non-truncated branch incorrectly implies older messages were
trimmed; update the logic that builds the error response (the object using
wasTruncated, message, and details in route.ts) so when wasTruncated is false
the message does not say messages were trimmed—instead return a clear statement
like "The conversation is too long for this model's context window. Please start
a new conversation or shorten the conversation" while preserving details:
'context_length_exceeded'.

return Math.ceil(text.length / 4);
// Detect non-ASCII heavy content (CJK, emoji, etc.)
// CJK Unified Ideographs, Hiragana, Katakana, Hangul, etc.
const nonAsciiCount = (text.match(/[^\x00-\x7F]/g) || []).length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and read the content around line 60
fd "ai-context-calculator.ts" --type f

Repository: 2witstudios/PageSpace

Length of output: 117


🏁 Script executed:

# Once we confirm the file exists, read it to verify the exact content at line 60
cat -n packages/lib/src/monitoring/ai-context-calculator.ts | sed -n '55,65p'

Repository: 2witstudios/PageSpace

Length of output: 636


🏁 Script executed:

# Check for linter configuration in the packages/lib directory
find packages/lib -maxdepth 3 -name "*.json" -o -name "*.mjs" -o -name "biome.json" -o -name ".eslintrc*" | head -20

Repository: 2witstudios/PageSpace

Length of output: 150


🏁 Script executed:

# Check if Biome is configured at the root or in packages
find . -maxdepth 2 -name "biome.json" -o -name "biome.jsonc"

Repository: 2witstudios/PageSpace

Length of output: 77


🏁 Script executed:

# Read the biome.json configuration
cat biome.json

Repository: 2witstudios/PageSpace

Length of output: 1839


🏁 Script executed:

# Check the suggested fix in the review comment - verify if the logic is equivalent
python3 << 'EOF'
# Test if the suggested fix produces the same results as the original

# Original: text.match(/[^\x00-\x7F]/g) matches all non-ASCII characters
# Suggested: reduce with codePointAt check for > 0x7f

test_strings = [
    "hello",  # All ASCII
    "hello🌍",  # ASCII + emoji
    "你好",  # Chinese characters
    "café",  # ASCII + accented char
    "مرحبا",  # Arabic
]

for test_str in test_strings:
    # Simulate original regex behavior
    original_count = len([c for c in test_str if ord(c) > 0x7f])
    
    # Simulate suggested fix behavior
    suggested_count = 0
    for char in test_str:
        code_point = ord(char)
        if code_point > 0x7f:
            suggested_count += 1
    
    print(f"String: '{test_str}' | Original: {original_count} | Suggested: {suggested_count} | Match: {original_count == suggested_count}")
EOF

Repository: 2witstudios/PageSpace

Length of output: 357


🏁 Script executed:

# Check if noControlCharactersInRegex is part of Biome's recommended rules
# and verify if it would be active with the current config
web_search "Biome noControlCharactersInRegex rule"

Repository: 2witstudios/PageSpace

Length of output: 115


🏁 Script executed:

# Search for other similar patterns that might have the same issue
rg -n '\\x00|\\x[0-9A-Fa-f]{2}' --type=ts --type=tsx packages/

Repository: 2witstudios/PageSpace

Length of output: 92


🏁 Script executed:

# Verify the suggested fix doesn't have any edge cases with surrogate pairs or multi-byte chars
python3 << 'EOF'
# Test edge cases with the suggested fix

test_cases = [
    ("hello", 0),  # All ASCII
    ("café", 1),  # Single accented character
    ("🌍", 1),  # Single emoji (surrogate pair in JS)
    ("你好世界", 4),  # Chinese characters
    ("hello🌍world", 1),  # Mixed ASCII and emoji
    ("\x00\x01\x7f", 0),  # Control characters at edge of range
    ("", 0),  # Empty string
]

for test_str, expected in test_cases:
    # Suggested fix logic
    count = 0
    for char in test_str:
        code_point = ord(char)
        if code_point > 0x7f:
            count += 1
    
    status = "✓" if count == expected else "✗"
    print(f"{status} String: {repr(test_str)} | Expected: {expected} | Got: {count}")
EOF

Repository: 2witstudios/PageSpace

Length of output: 355


🏁 Script executed:

# Search for control character patterns in the codebase using ts file type only
rg -n '\\x00|\\x[0-9A-Fa-f]{2}' --type=ts packages/

Repository: 2witstudios/PageSpace

Length of output: 2413


🏁 Script executed:

# Verify Biome rule documentation is available and check if recommended includes the suspicious rule
python3 << 'EOF'
# Check Biome's default rule settings for noControlCharactersInRegex
# Based on the biome.json, the "suspicious" category doesn't explicitly disable the rule,
# which means it uses the recommended default

biome_config = {
    "suspicious": {
        "noArrayIndexKey": "off",
        "noExplicitAny": "off",
        "noImplicitAnyLet": "off",
        "noDoubleEquals": "off",
        "noGlobalIsNan": "off",
        "noAssignInExpressions": "off"
        # noControlCharactersInRegex is NOT listed, so it uses the default (recommended = true for suspicious)
    }
}

print("Analysis of biome.json configuration:")
print("- Linter is enabled: true")
print("- 'recommended' is enabled globally for linter rules")
print("- 'suspicious' category does NOT explicitly disable 'noControlCharactersInRegex'")
print("- Therefore, the rule IS active by default")
print("\nThe regex pattern /[^\\x00-\\x7F]/ contains control character escapes,")
print("which would trigger the 'noControlCharactersInRegex' rule in Biome.")
EOF

Repository: 2witstudios/PageSpace

Length of output: 443


Replace the regex pattern to resolve the noControlCharactersInRegex Biome linter violation.

The pattern /[^\x00-\x7F]/g at line 60 triggers Biome's suspicious/noControlCharactersInRegex rule, which is enabled by default. Use explicit code point checking instead:

Suggested fix
-  const nonAsciiCount = (text.match(/[^\x00-\x7F]/g) || []).length;
+  const nonAsciiCount = [...text].reduce((count, char) => {
+    const codePoint = char.codePointAt(0) ?? 0;
+    return count + (codePoint > 0x7f ? 1 : 0);
+  }, 0);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const nonAsciiCount = (text.match(/[^\x00-\x7F]/g) || []).length;
const nonAsciiCount = [...text].reduce((count, char) => {
const codePoint = char.codePointAt(0) ?? 0;
return count + (codePoint > 0x7f ? 1 : 0);
}, 0);
🧰 Tools
🪛 Biome (2.4.4)

[error] 60-60: Unexpected control character in a regular expression.

(lint/suspicious/noControlCharactersInRegex)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/lib/src/monitoring/ai-context-calculator.ts` at line 60, The regex
/[^\x00-\x7F]/g used to compute nonAsciiCount violates the linter; replace that
regex-based approach with explicit code-point checking: iterate over the string
(e.g., using for...of or Array.from) and count characters whose codePointAt(0) >
0x7F, then store that count in nonAsciiCount. Update the expression that
computes nonAsciiCount accordingly so it handles full Unicode code points
(surrogate pairs) and removes the problematic regex.

Comment on lines +225 to +255
// OpenRouter - use model-specific limits where known, else 200k conservative default
if (providerLower === 'openrouter') {
// Claude models via OpenRouter
if (modelLower.includes('claude')) return 200_000;
// Gemini models via OpenRouter
if (modelLower.includes('gemini-2.5')) return 1_000_000;
if (modelLower.includes('gemini-2.0') || modelLower.includes('gemini-1.5')) return 1_000_000;
// GPT models via OpenRouter
if (modelLower.includes('gpt-5.2')) {
return modelLower.includes('mini') || modelLower.includes('nano') ? 256_000 : 400_000;
}
if (modelLower.includes('gpt-5.1')) return 400_000;
if (modelLower.includes('gpt-5')) {
return modelLower.includes('mini') || modelLower.includes('nano') ? 128_000 : 272_000;
}
if (modelLower.includes('gpt-4o') || modelLower.includes('gpt-4-turbo')) return 128_000;
// DeepSeek models - commonly 64k or 128k
if (modelLower.includes('deepseek-r1') || modelLower.includes('deepseek-v3')) return 128_000;
if (modelLower.includes('deepseek')) return 64_000;
// Qwen models
if (modelLower.includes('qwen-2.5') || modelLower.includes('qwq')) return 128_000;
if (modelLower.includes('qwen')) return 32_000;
// Llama models
if (modelLower.includes('llama-3') || modelLower.includes('llama3')) return 128_000;
if (modelLower.includes('llama')) return 32_000;
// Mistral models
if (modelLower.includes('mistral-large') || modelLower.includes('mistral-nemo')) return 128_000;
if (modelLower.includes('mistral')) return 32_000;
// OpenRouter platform hard cap is 400k for many endpoints - use 200k as safe default
return 200_000;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

OpenRouter-specific limits are partially shadowed by earlier branches.

For OpenRouter models, checks at Line 146 / Line 173 / Line 184 can match first via model name (gpt, claude, gemini), so this block won’t run for many of the models it intends to handle. That defeats the provider-specific conservative mapping.

✅ Suggested fix direction
 export function getContextWindowSize(model: string, provider?: string): number {
   const providerLower = provider?.toLowerCase() || '';
   const modelLower = model.toLowerCase();

+  // OpenRouter first to ensure provider-specific limits are applied
+  if (providerLower === 'openrouter') {
+    // ...existing OpenRouter mapping...
+    return 200_000;
+  }
+
   // OpenAI models
-  if (providerLower === 'openai' || modelLower.includes('gpt')) {
+  if (providerLower === 'openai' || (!providerLower && modelLower.includes('gpt'))) {
     ...
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/lib/src/monitoring/ai-context-calculator.ts` around lines 225 - 255,
The OpenRouter-specific model limit block is being bypassed because earlier
generic model-name branches match first; update those earlier checks to exclude
OpenRouter by adding a guard like providerLower !== 'openrouter' to the generic
model-name conditionals (the branches that inspect modelLower for 'gpt',
'claude', 'gemini', etc.) or alternatively move the OpenRouter block earlier so
it runs before the generic branches; locate the providerLower and modelLower
checks in ai-context-calculator.ts and adjust the conditionals around the
generic model checks or reposition the OpenRouter if (providerLower ===
'openrouter') block so OpenRouter-specific mappings are applied correctly.

2witstudios added a commit that referenced this pull request Feb 26, 2026
Reviews proactive context window management for AI chat, covering token
estimation accuracy, OpenRouter model registry, error handling, type
safety, and OWASP compliance. Recommends adding tests and verifying
error propagation before merge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2witstudios added a commit that referenced this pull request Feb 26, 2026
- Fix wasTruncated message inversion: non-truncated error no longer
  claims messages were trimmed
- Guard empty messages: return 413 when truncation leaves zero messages
  instead of sending empty context to streamText
- Fix estimateMessageTokens: match actual Vercel AI SDK tool-{name}
  part format (input/output/state) instead of legacy tool-call/tool-result
- Reconcile context windows: extract MODEL_CONTEXT_WINDOWS into shared
  model-context-windows.ts as single source of truth (fixes glm-5
  200000→202752, minimax m2.5 via OpenRouter 1M→204800 divergence)
- Fix isContextLengthError: replace broad '413' string match with
  specific HTTP status pattern to prevent false positives
- Fix estimateTokens: replace regex array allocation with efficient
  charCode loop for CJK detection
- Add 42 unit tests for isContextLengthError, estimateTokens,
  estimateMessageTokens, getContextWindowSize, determineMessagesToInclude

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2witstudios added a commit that referenced this pull request Feb 26, 2026
- Fix inverted wasTruncated message logic in catch block 413 response
- Guard against empty includedMessages after truncation (return 413 early)
- Handle tool-{toolName} part format in estimateMessageTokens (PageSpace DB format)
- Extract MODEL_CONTEXT_WINDOWS to shared model-context-windows.ts (single source of truth)
- Narrow isContextLengthError '413' check to require context/token keywords
- Replace regex with for-loop counter in estimateTokens for memory efficiency
- Add unit tests for isContextLengthError and CJK-aware estimateTokens

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants