feat(knowledge): allow Excel file indexing with 2MB size limit by parabala · Pull Request #805 · wecode-ai/Wegent

parabala · 2026-03-20T09:23:39Z

Remove .xls and .xlsx from RAG_INDEXING_DISABLED_EXTENSIONS
Add EXCEL_FILE_SIZE_LIMIT (2MB) and EXCEL_EXTENSIONS constants
Add file_size parameter to get_rag_indexing_skip_reason function
Return error code format for i18n support: EXCEL_FILE_SIZE_EXCEEDED|ext|limit|size
Update orchestrator to pass file_size to indexing functions
Update frontend DocumentItem to show 'Indexing' status when reindexing
Show 'Index Unavailable' when not reindexing and is_active=false
Add i18n translations for Excel file size exceeded error (zh-CN/en)
Update tests to verify Excel files within 2MB limit are allowed

Summary by CodeRabbit

New Features
- Excel files are now indexable when ≤2MB; files >2MB are skipped with a size-specific reason.
- Added localized messages (EN/ZH) that include extension, limit, and actual size for size-exceeded errors.
Bug Fixes / UX
- Reindexing failure toasts and document tooltips now surface the Excel size-limit reason and show a clear, formatted message.
Tests
- Updated tests to reflect size-dependent Excel indexing behavior.

coderabbitai · 2026-03-20T09:23:57Z

📝 Walkthrough

Walkthrough

Excel files are no longer unconditionally excluded by extension; indexing skips Excel only when file_size is known and exceeds a 2MB limit. Backend APIs, orchestrator flows, frontend UI, tests, and i18n were updated to propagate and surface this size-based skip reason.

Changes

Cohort / File(s)	Summary
Backend Indexing Logic `backend/app/services/knowledge/indexing.py`	Replace extension-only exclusion with size check. Add `EXCEL_FILE_SIZE_LIMIT` (2MB) and `EXCEL_EXTENSIONS`. Update `get_rag_indexing_skip_reason(source_type, file_extension, file_size)` to return `EXCEL_FILE_SIZE_EXCEEDED
Backend Orchestrator Integration `backend/app/services/knowledge/orchestrator.py`	Pass `file_size` into `get_rag_indexing_skip_reason(...)` for create/index and reindex flows (uses `data.file_size` or `document.file_size`).
Backend Tests `backend/tests/services/knowledge/test_orchestrator.py`	Adjust tests to treat Excel skipping as size-dependent (>2MB). Fixtures set file_size=3MB where expecting skip/error; assertions updated to expect `EXCEL_FILE_SIZE_EXCEEDED` token.
Frontend Document Components `frontend/src/features/knowledge/document/components/DocumentItem.tsx`, `frontend/src/features/knowledge/document/components/DocumentList.tsx`	Detect oversized Excel files (`isExcelExceedingSizeLimit`) and show Excel-specific tooltip. `DocumentList` parses `EXCEL_FILE_SIZE_EXCEEDED
Frontend Internationalization `frontend/src/i18n/locales/en/knowledge.json`, `frontend/src/i18n/locales/zh-CN/knowledge.json`	Add `document.excelFileSizeExceeded` translation keys with `{{extension}}`, `{{limit}}`, and `{{size}}` placeholders.

Sequence Diagram(s)

sequenceDiagram
  participant UI as DocumentList / DocumentItem (Frontend)
  participant Orch as Orchestrator (Backend)
  participant IndexSvc as Indexing Service
  participant DB as KnowledgeDocument (DB)

  UI->>Orch: request reindex / trigger indexing
  Orch->>DB: fetch metadata (file_extension, file_size)
  Orch->>IndexSvc: start indexing (passes source_type, file_extension, file_size)
  IndexSvc->>IndexSvc: get_rag_indexing_skip_reason(source_type, file_extension, file_size)
  alt Excel and file_size > 2MB
    IndexSvc-->>Orch: EXCEL_FILE_SIZE_EXCEEDED|ext|limit|size (skip)
    Orch-->>UI: return error token (frontend shows localized toast)
  else allowed (size unknown or <= limit)
    IndexSvc->>IndexSvc: schedule/perform indexing
    IndexSvc-->>Orch: indexing scheduled/started
    Orch-->>UI: success
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰🥕 I hopped through sheets both wide and deep,
Two megs the fence I now must keep.
If XLSX grows beyond the gate,
A kindly hop will mark its fate.
Shrink or split — then I will leap!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main feature: enabling Excel file indexing with a 2MB size limit, which aligns with the core change across backend and frontend.
Docstring Coverage	✅ Passed	Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/services/knowledge/indexing.py (1)
502-529: ⚠️ Potential issue | 🟠 Major

Don't rely solely on KnowledgeDocument.file_size for the Excel gate.

When document_id is present, this path never consults the attachment size. Older rows can still have file_size as NULL/0, so the new check fails open and a >2MB Excel file can still be indexed/reindexed even though attachment_id is available here. Please fall back to attachment metadata before calling get_rag_indexing_skip_reason(), and fail closed for Excel if neither size is trustworthy.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/knowledge/indexing.py` around lines 502 - 529, When
resolving file metadata before calling get_rag_indexing_skip_reason, don't trust
KnowledgeDocument.file_size alone: in the document_id branch (KnowledgeDocument)
if document.file_size is None or 0 and attachment_id is present, query
SubtaskContext (filtering ContextType.ATTACHMENT) and override
file_size/file_extension with the attachment's values; otherwise if after both
checks file_size is still missing/zero and source_type indicates Excel treat
size as "too large" (fail closed) so get_rag_indexing_skip_reason will skip
indexing. Ensure you still populate source_type from KnowledgeDocument when
available and only fall back to attachment metadata for
file_size/file_extension.

🧹 Nitpick comments (1)

frontend/src/features/knowledge/document/components/DocumentItem.tsx (1)
126-129: Extract the Excel limit/tooltip logic into a shared helper.

The 2MB policy is duplicated here in both layouts (EXCEL_FILE_SIZE_LIMIT, limit: 2, and the repeated size formatting). A small helper/shared constant would keep the compact and table UIs in sync and reduce drift from the backend policy over time.

Also applies to: 234-239, 442-447
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx` around
lines 126 - 129, The Excel size-limit logic (EXCEL_FILE_SIZE_LIMIT, isExcel,
isExcelExceedingSizeLimit and the repeated formatted "2 MB" text) is duplicated
across multiple UI components; extract a single shared helper that exports the
numeric limit constant (e.g., EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024) and
utility functions like isExcel(fileExtension) and isExcelExceedingLimit(file)
plus a formattedLimit getter used by tooltips; replace local uses of
EXCEL_FILE_SIZE_LIMIT, isExcel and isExcelExceedingSizeLimit in DocumentItem and
the other occurrences with imports from that helper so both compact and table
UIs share the same source of truth.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/tests/services/knowledge/test_orchestrator.py`:
- Around line 422-426: The test
test_excel_documents_within_size_limit_are_allowed never exercises the size
check because it omits the file_size argument; update the test to call
get_rag_indexing_skip_reason("file", "xlsx", file_size=2 * 1024 * 1024) so the
2MB boundary is evaluated and assert the result is None, and optionally add an
additional assertion for a size just above the limit (e.g., file_size=2 * 1024 *
1024 + 1) to verify the skip reason is returned.

In `@frontend/src/features/knowledge/document/components/DocumentList.tsx`:
- Around line 382-400: In the catch block inside DocumentList.tsx that builds
the toast error (the code setting errorMessage and using
t('document.document.reindexFailed')), remove the else branch that assigns the
raw backend message (errorMessage = message) so we do not surface unlocalized
server text; keep the existing EXCEL_FILE_SIZE_EXCEEDED parsing and i18n
mapping, and for any other error leave errorMessage as the generic
t('document.document.reindexFailed') fallback unless you add explicit
error-code-to-i18n mappings later.

---

Outside diff comments:
In `@backend/app/services/knowledge/indexing.py`:
- Around line 502-529: When resolving file metadata before calling
get_rag_indexing_skip_reason, don't trust KnowledgeDocument.file_size alone: in
the document_id branch (KnowledgeDocument) if document.file_size is None or 0
and attachment_id is present, query SubtaskContext (filtering
ContextType.ATTACHMENT) and override file_size/file_extension with the
attachment's values; otherwise if after both checks file_size is still
missing/zero and source_type indicates Excel treat size as "too large" (fail
closed) so get_rag_indexing_skip_reason will skip indexing. Ensure you still
populate source_type from KnowledgeDocument when available and only fall back to
attachment metadata for file_size/file_extension.

---

Nitpick comments:
In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx`:
- Around line 126-129: The Excel size-limit logic (EXCEL_FILE_SIZE_LIMIT,
isExcel, isExcelExceedingSizeLimit and the repeated formatted "2 MB" text) is
duplicated across multiple UI components; extract a single shared helper that
exports the numeric limit constant (e.g., EXCEL_FILE_SIZE_LIMIT = 2 * 1024 *
1024) and utility functions like isExcel(fileExtension) and
isExcelExceedingLimit(file) plus a formattedLimit getter used by tooltips;
replace local uses of EXCEL_FILE_SIZE_LIMIT, isExcel and
isExcelExceedingSizeLimit in DocumentItem and the other occurrences with imports
from that helper so both compact and table UIs share the same source of truth.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 35414538-0e85-41ef-8b70-84e004b3c793

📥 Commits

Reviewing files that changed from the base of the PR and between 45958dc and d54e2bb.

📒 Files selected for processing (7)

backend/app/services/knowledge/indexing.py
backend/app/services/knowledge/orchestrator.py
backend/tests/services/knowledge/test_orchestrator.py
frontend/src/features/knowledge/document/components/DocumentItem.tsx
frontend/src/features/knowledge/document/components/DocumentList.tsx
frontend/src/i18n/locales/en/knowledge.json
frontend/src/i18n/locales/zh-CN/knowledge.json

coderabbitai · 2026-03-20T09:33:52Z

backend/tests/services/knowledge/test_orchestrator.py

+    def test_excel_documents_within_size_limit_are_allowed(self):
+        """Test Excel extensions within 2MB size limit are allowed for RAG indexing."""
        reason = get_rag_indexing_skip_reason("file", "xlsx")

-        assert reason == "Excel documents (.xlsx) are excluded from RAG indexing"
+        assert reason is None


⚠️ Potential issue | 🟡 Minor

This test never hits the "within 2MB" branch.

Omitting file_size makes get_rag_indexing_skip_reason() return None regardless of the workbook size, so the new size boundary is not exercised here. Pass 2 * 1024 * 1024 explicitly, and ideally add an exact-boundary assertion.

🧪 Suggested fix

- reason = get_rag_indexing_skip_reason("file", "xlsx") + reason = get_rag_indexing_skip_reason("file", "xlsx", 2 * 1024 * 1024)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/tests/services/knowledge/test_orchestrator.py` around lines 422 - 426, The test test_excel_documents_within_size_limit_are_allowed never exercises the size check because it omits the file_size argument; update the test to call get_rag_indexing_skip_reason("file", "xlsx", file_size=2 * 1024 * 1024) so the 2MB boundary is evaluated and assert the result is None, and optionally add an additional assertion for a size just above the limit (e.g., file_size=2 * 1024 * 1024 + 1) to verify the skip reason is returned.

coderabbitai · 2026-03-20T09:33:52Z

frontend/src/features/knowledge/document/components/DocumentList.tsx

+    } catch (err) {
+      // Parse error message and use i18n translation if it's a known error code
+      let errorMessage = t('document.document.reindexFailed')
+      if (err instanceof Error) {
+        const message = err.message
+        // Check if it's an EXCEL_FILE_SIZE_EXCEEDED error code
+        // Format: EXCEL_FILE_SIZE_EXCEEDED|extension|limit|size
+        if (message.startsWith('EXCEL_FILE_SIZE_EXCEEDED|')) {
+          const parts = message.split('|')
+          if (parts.length === 4) {
+            errorMessage = t('document.document.excelFileSizeExceeded', {
+              extension: parts[1],
+              limit: parts[2],
+              size: parts[3],
+            })
+          }
+        } else {
+          // Use the original error message for other errors
+          errorMessage = message


⚠️ Potential issue | 🟡 Minor

Don't surface raw backend errors in the reindex toast.

Only EXCEL_FILE_SIZE_EXCEEDED is mapped to i18n here. The else branch now shows raw backend messages for every other failure, which regresses localization and leaks server wording straight into user-facing toasts. Keep the generic reindexFailed fallback unless you add explicit error-code mappings for those cases.

💡 Suggested change

- } else { - // Use the original error message for other errors - errorMessage = message }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

} catch (err) {

// Parse error message and use i18n translation if it's a known error code

let errorMessage = t('document.document.reindexFailed')

if (err instanceof Error) {

const message = err.message

// Check if it's an EXCEL_FILE_SIZE_EXCEEDED error code

// Format: EXCEL_FILE_SIZE_EXCEEDED|extension|limit|size

if (message.startsWith('EXCEL_FILE_SIZE_EXCEEDED|')) {

const parts = message.split('|')

if (parts.length === 4) {

errorMessage = t('document.document.excelFileSizeExceeded', {

extension: parts[1],

limit: parts[2],

size: parts[3],

})

}

} else {

// Use the original error message for other errors

errorMessage = message

} catch (err) {

// Parse error message and use i18n translation if it's a known error code

let errorMessage = t('document.document.reindexFailed')

if (err instanceof Error) {

const message = err.message

// Check if it's an EXCEL_FILE_SIZE_EXCEEDED error code

// Format: EXCEL_FILE_SIZE_EXCEEDED|extension|limit|size

if (message.startsWith('EXCEL_FILE_SIZE_EXCEEDED|')) {

const parts = message.split('|')

if (parts.length === 4) {

errorMessage = t('document.document.excelFileSizeExceeded', {

extension: parts[1],

limit: parts[2],

size: parts[3],

})

}

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@frontend/src/features/knowledge/document/components/DocumentList.tsx` around lines 382 - 400, In the catch block inside DocumentList.tsx that builds the toast error (the code setting errorMessage and using t('document.document.reindexFailed')), remove the else branch that assigns the raw backend message (errorMessage = message) so we do not surface unlocalized server text; keep the existing EXCEL_FILE_SIZE_EXCEEDED parsing and i18n mapping, and for any other error leave errorMessage as the generic t('document.document.reindexFailed') fallback unless you add explicit error-code-to-i18n mappings later.

kissghosts · 2026-03-20T09:47:00Z

frontend/src/features/knowledge/document/components/DocumentList.tsx

+        const message = err.message
+        // Check if it's an EXCEL_FILE_SIZE_EXCEEDED error code
+        // Format: EXCEL_FILE_SIZE_EXCEEDED|extension|limit|size
+        if (message.startsWith('EXCEL_FILE_SIZE_EXCEEDED|')) {


我刚刚改权限那边，加了个 error_code 好像可以复用

backend/app/services/knowledge/indexing.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/services/knowledge/indexing.py (1)

91-114: ⚠️ Potential issue | 🟠 Major

Make file_size mandatory to avoid Excel size-limit bypass.

Line 91 makes file_size optional; any caller that omits it will bypass the Excel limit check at Line 113. That weakens the 2MB enforcement contract.

🔧 Proposed fix

 def get_rag_indexing_skip_reason(
     source_type: Optional[str],
     file_extension: Optional[str],
-    file_size: Optional[int] = None,
+    file_size: int,
 ) -> Optional[str]:
@@
-    if normalized_extension in EXCEL_EXTENSIONS:
-        if file_size is not None and file_size > EXCEL_FILE_SIZE_LIMIT:
+    if normalized_extension in EXCEL_EXTENSIONS and file_size > EXCEL_FILE_SIZE_LIMIT:
             size_mb = file_size / (1024 * 1024)
             limit_mb = EXCEL_FILE_SIZE_LIMIT / (1024 * 1024)
             return f"EXCEL_FILE_SIZE_EXCEEDED|{normalized_extension}|{limit_mb:.0f}|{size_mb:.2f}"

-        file_size = None
+        file_size = 0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/knowledge/indexing.py` around lines 91 - 114, The
file_size parameter is optional which lets callers omit it and bypass the Excel
2MB check; make file_size mandatory (remove Optional and the default None) in
the function signature that returns the skip reason, update all callers to
provide the document size, and ensure runtime validation around
EXCEL_EXTENSIONS/EXCEL_FILE_SIZE_LIMIT (e.g., raise or return an error/skip
reason if size is not provided for Excel files) so the Excel size-limit cannot
be bypassed; reference the file_size parameter, EXCEL_EXTENSIONS,
EXCEL_FILE_SIZE_LIMIT, and the skip-reason function when making changes.

🧹 Nitpick comments (1)

backend/app/services/knowledge/indexing.py (1)
425-440: run_document_indexing is doing too much in one function.

This function mixes lookup, policy checks, indexing orchestration, status persistence, and summary triggering; splitting into smaller helpers will improve maintainability and testability.

As per coding guidelines, "Function length: Max 50 lines per function (preferred)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/knowledge/indexing.py` around lines 425 - 440,
run_document_indexing is too large and mixes lookup, policy checks,
orchestration, persistence and summary triggering; break it into small helpers:
extract the database/KB lookup logic into a function like
fetch_kb_and_attachment(knowledge_base_id, attachment_id, db), policy checks
into validate_indexing_permissions(user_id, knowledge_base, attachment),
indexing orchestration into orchestrate_indexing(...), status persistence into
persist_index_status(kb_index_info, status, db), and summary logic into
trigger_summary_if_needed(...); then simplify run_document_indexing to call
these helpers in sequence, pass only needed parameters, and move any complex
branching into the new functions to keep run_document_indexing concise and under
~50 lines.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@backend/app/services/knowledge/indexing.py`:
- Around line 91-114: The file_size parameter is optional which lets callers
omit it and bypass the Excel 2MB check; make file_size mandatory (remove
Optional and the default None) in the function signature that returns the skip
reason, update all callers to provide the document size, and ensure runtime
validation around EXCEL_EXTENSIONS/EXCEL_FILE_SIZE_LIMIT (e.g., raise or return
an error/skip reason if size is not provided for Excel files) so the Excel
size-limit cannot be bypassed; reference the file_size parameter,
EXCEL_EXTENSIONS, EXCEL_FILE_SIZE_LIMIT, and the skip-reason function when
making changes.

---

Nitpick comments:
In `@backend/app/services/knowledge/indexing.py`:
- Around line 425-440: run_document_indexing is too large and mixes lookup,
policy checks, orchestration, persistence and summary triggering; break it into
small helpers: extract the database/KB lookup logic into a function like
fetch_kb_and_attachment(knowledge_base_id, attachment_id, db), policy checks
into validate_indexing_permissions(user_id, knowledge_base, attachment),
indexing orchestration into orchestrate_indexing(...), status persistence into
persist_index_status(kb_index_info, status, db), and summary logic into
trigger_summary_if_needed(...); then simplify run_document_indexing to call
these helpers in sequence, pass only needed parameters, and move any complex
branching into the new functions to keep run_document_indexing concise and under
~50 lines.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3824eb33-ed79-49c3-8b03-cd8e3a24cd12

📥 Commits

Reviewing files that changed from the base of the PR and between d54e2bb and f5c6078.

📒 Files selected for processing (1)

backend/app/services/knowledge/indexing.py

- Remove .xls and .xlsx from RAG_INDEXING_DISABLED_EXTENSIONS - Add EXCEL_FILE_SIZE_LIMIT (2MB) and EXCEL_EXTENSIONS constants - Add file_size parameter to get_rag_indexing_skip_reason function - Return error code format for i18n support: EXCEL_FILE_SIZE_EXCEEDED|ext|limit|size - Update orchestrator to pass file_size to indexing functions - Update frontend DocumentItem to show 'Indexing' status when reindexing - Show 'Index Unavailable' when not reindexing and is_active=false - Add i18n translations for Excel file size exceeded error (zh-CN/en) - Update tests to verify Excel files within 2MB limit are allowed

…nstant Remove the unused RAG_INDEXING_DISABLED_EXTENSIONS frozenset constant and its associated extension check logic from the indexing service. The constant was empty and served no functional purpose.

…ing in DocumentList Replace string prefix matching with ApiError.errorCode field check for EXCEL_FILE_SIZE_EXCEEDED error handling. This leverages the existing ApiError class from client.ts for more robust and type-safe error handling.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

frontend/src/features/knowledge/document/components/DocumentItem.tsx (1)

268-276: Remove redundant ternary branch in unavailable tooltip

At Line 274 and Line 500, both branches return the same indexStatus.unavailableHint. This can be simplified.

Proposed cleanup

-                      {isExcelExceedingSizeLimit
-                        ? t('knowledge:document.document.excelFileSizeExceeded', {
-                            extension: document.file_extension,
-                            limit: 2,
-                            size: (document.file_size / (1024 * 1024)).toFixed(2),
-                          })
-                        : !ragConfigured
-                          ? t('knowledge:document.document.indexStatus.unavailableHint')
-                          : t('knowledge:document.document.indexStatus.unavailableHint')}
+                      {isExcelExceedingSizeLimit
+                        ? t('knowledge:document.document.excelFileSizeExceeded', {
+                            extension: document.file_extension,
+                            limit: 2,
+                            size: (document.file_size / (1024 * 1024)).toFixed(2),
+                          })
+                        : t('knowledge:document.document.indexStatus.unavailableHint')}

Also applies to: 494-503

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx` around
lines 268 - 276, The JSX in DocumentItem.tsx uses a redundant nested ternary
where both the true and false branches return the same
t('knowledge:document.document.indexStatus.unavailableHint') value; update the
expression that checks isExcelExceedingSizeLimit and ragConfigured (used in the
JSX around the unavailable tooltip) to remove the duplicated branch so it only
renders the excel-size message when isExcelExceedingSizeLimit is true, otherwise
render t('knowledge:document.document.indexStatus.unavailableHint'); keep the
existing variables isExcelExceedingSizeLimit and ragConfigured and the
translation key unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx`:
- Around line 149-152: The Excel detection misses dot-prefixed extensions;
update the normalization used by isExcel so document.file_extension is sanitized
(trim, remove leading '.' if present, and toLowerCase()) before checking against
['xls','xlsx']. Keep EXCEL_FILE_SIZE_LIMIT and isExcelExceedingSizeLimit logic
but compute isExcel from the normalizedExtension (derived from
document.file_extension) to ensure '.xlsx' and 'xlsx' are both recognized.
- Around line 149-152: The UI computes EXCEL_FILE_SIZE_LIMIT, isExcel and
isExcelExceedingSizeLimit but never uses it to prevent user actions; update the
DocumentItem action rendering logic (both the compact and normal action areas
where reindex is offered) to check isExcelExceedingSizeLimit and either hide or
disable the reindex action/button (and show an explanatory tooltip/message) when
true so users cannot trigger guaranteed-to-fail reindex requests for oversized
Excel files.

---

Nitpick comments:
In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx`:
- Around line 268-276: The JSX in DocumentItem.tsx uses a redundant nested
ternary where both the true and false branches return the same
t('knowledge:document.document.indexStatus.unavailableHint') value; update the
expression that checks isExcelExceedingSizeLimit and ragConfigured (used in the
JSX around the unavailable tooltip) to remove the duplicated branch so it only
renders the excel-size message when isExcelExceedingSizeLimit is true, otherwise
render t('knowledge:document.document.indexStatus.unavailableHint'); keep the
existing variables isExcelExceedingSizeLimit and ragConfigured and the
translation key unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8cc521b2-8f19-4821-94aa-7094517ba7db

📥 Commits

Reviewing files that changed from the base of the PR and between f5c6078 and d9e8064.

📒 Files selected for processing (7)

backend/app/services/knowledge/indexing.py
backend/app/services/knowledge/orchestrator.py
backend/tests/services/knowledge/test_orchestrator.py
frontend/src/features/knowledge/document/components/DocumentItem.tsx
frontend/src/features/knowledge/document/components/DocumentList.tsx
frontend/src/i18n/locales/en/knowledge.json
frontend/src/i18n/locales/zh-CN/knowledge.json

✅ Files skipped from review due to trivial changes (1)

frontend/src/i18n/locales/en/knowledge.json

🚧 Files skipped from review as they are similar to previous changes (5)

frontend/src/i18n/locales/zh-CN/knowledge.json
backend/app/services/knowledge/orchestrator.py
frontend/src/features/knowledge/document/components/DocumentList.tsx
backend/tests/services/knowledge/test_orchestrator.py
backend/app/services/knowledge/indexing.py

coderabbitai · 2026-03-22T07:38:00Z

frontend/src/features/knowledge/document/components/DocumentItem.tsx

+  // Check if Excel file exceeds size limit (2MB)
+  const EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024 // 2MB
+  const isExcel = ['xls', 'xlsx'].includes(document.file_extension?.toLowerCase() || '')
+  const isExcelExceedingSizeLimit = isExcel && document.file_size > EXCEL_FILE_SIZE_LIMIT


⚠️ Potential issue | 🟠 Major

Normalize extension before Excel detection to avoid false negatives

At Line 151, the check only matches 'xls'/'xlsx'. If document.file_extension is dot-prefixed (e.g. .xlsx, consistent with backend normalization), Excel files will not be recognized and Line 152 won’t flag size overflow.

Proposed fix

- const EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024 // 2MB - const isExcel = ['xls', 'xlsx'].includes(document.file_extension?.toLowerCase() || '') - const isExcelExceedingSizeLimit = isExcel && document.file_size > EXCEL_FILE_SIZE_LIMIT + const EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024 // 2MB + const normalizedExtension = (document.file_extension || '').toLowerCase().replace(/^\./, '') + const isExcel = normalizedExtension === 'xls' || normalizedExtension === 'xlsx' + const isExcelExceedingSizeLimit = + isExcel && typeof document.file_size === 'number' && document.file_size > EXCEL_FILE_SIZE_LIMIT

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Check if Excel file exceeds size limit (2MB)

const EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024 // 2MB

const isExcel = ['xls', 'xlsx'].includes(document.file_extension?.toLowerCase() || '')

const isExcelExceedingSizeLimit = isExcel && document.file_size > EXCEL_FILE_SIZE_LIMIT

// Check if Excel file exceeds size limit (2MB)

const EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024 // 2MB

const normalizedExtension = (document.file_extension || '').toLowerCase().replace(/^\./, '')

const isExcel = normalizedExtension === 'xls' || normalizedExtension === 'xlsx'

const isExcelExceedingSizeLimit =

isExcel && typeof document.file_size === 'number' && document.file_size > EXCEL_FILE_SIZE_LIMIT

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx` around lines 149 - 152, The Excel detection misses dot-prefixed extensions; update the normalization used by isExcel so document.file_extension is sanitized (trim, remove leading '.' if present, and toLowerCase()) before checking against ['xls','xlsx']. Keep EXCEL_FILE_SIZE_LIMIT and isExcelExceedingSizeLimit logic but compute isExcel from the normalizedExtension (derived from document.file_extension) to ensure '.xlsx' and 'xlsx' are both recognized.

⚠️ Potential issue | 🟠 Major

Use isExcelExceedingSizeLimit to block known-failing reindex actions

You already compute the oversize condition, but reindex actions are still exposed in compact/normal action areas. That allows user-triggered requests that are guaranteed to fail for oversized Excel files.

Proposed fix

+ const canReindex = ragConfigured && !document.is_active && !isTable && !isExcelExceedingSizeLimit - {ragConfigured && !document.is_active && !isTable && onReindex && ( + {canReindex && onReindex && ( <DropdownMenuItem onClick={handleReindex} disabled={isReindexing}> ... </DropdownMenuItem> )} - {ragConfigured && !document.is_active && !isTable && onReindex && ( + {canReindex && onReindex && ( <button ... >

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx` around lines 149 - 152, The UI computes EXCEL_FILE_SIZE_LIMIT, isExcel and isExcelExceedingSizeLimit but never uses it to prevent user actions; update the DocumentItem action rendering logic (both the compact and normal action areas where reindex is offered) to check isExcelExceedingSizeLimit and either hide or disable the reindex action/button (and show an explanatory tooltip/message) when true so users cannot trigger guaranteed-to-fail reindex requests for oversized Excel files.

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

kissghosts reviewed Mar 20, 2026

View reviewed changes

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

parabala added 3 commits March 22, 2026 14:42

refactor(knowledge): remove empty RAG_INDEXING_DISABLED_EXTENSIONS co…

a9f80c0

…nstant Remove the unused RAG_INDEXING_DISABLED_EXTENSIONS frozenset constant and its associated extension check logic from the indexing service. The constant was empty and served no functional purpose.

parabala force-pushed the feature/allow-index-xlxs branch from f5c6078 to d9e8064 Compare March 22, 2026 07:33

coderabbitai bot reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(knowledge): allow Excel file indexing with 2MB size limit#805

feat(knowledge): allow Excel file indexing with 2MB size limit#805
parabala wants to merge 3 commits intowecode-ai:mainfrom
parabala:feature/allow-index-xlxs

parabala commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 20, 2026

Uh oh!

coderabbitai bot Mar 20, 2026

Uh oh!

kissghosts Mar 20, 2026

Uh oh!

parabala Mar 22, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

parabala commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

kissghosts Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

parabala Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

parabala commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading