feat(knowledge): allow Excel file indexing with 2MB size limit#805
feat(knowledge): allow Excel file indexing with 2MB size limit#805parabala wants to merge 3 commits intowecode-ai:mainfrom
Conversation
📝 WalkthroughWalkthroughExcel files are no longer unconditionally excluded by extension; indexing skips Excel only when file_size is known and exceeds a 2MB limit. Backend APIs, orchestrator flows, frontend UI, tests, and i18n were updated to propagate and surface this size-based skip reason. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as DocumentList / DocumentItem (Frontend)
participant Orch as Orchestrator (Backend)
participant IndexSvc as Indexing Service
participant DB as KnowledgeDocument (DB)
UI->>Orch: request reindex / trigger indexing
Orch->>DB: fetch metadata (file_extension, file_size)
Orch->>IndexSvc: start indexing (passes source_type, file_extension, file_size)
IndexSvc->>IndexSvc: get_rag_indexing_skip_reason(source_type, file_extension, file_size)
alt Excel and file_size > 2MB
IndexSvc-->>Orch: EXCEL_FILE_SIZE_EXCEEDED|ext|limit|size (skip)
Orch-->>UI: return error token (frontend shows localized toast)
else allowed (size unknown or <= limit)
IndexSvc->>IndexSvc: schedule/perform indexing
IndexSvc-->>Orch: indexing scheduled/started
Orch-->>UI: success
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable sequence diagrams in the walkthrough.Disable the |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/services/knowledge/indexing.py (1)
502-529:⚠️ Potential issue | 🟠 MajorDon't rely solely on
KnowledgeDocument.file_sizefor the Excel gate.When
document_idis present, this path never consults the attachment size. Older rows can still havefile_sizeasNULL/0, so the new check fails open and a >2MB Excel file can still be indexed/reindexed even thoughattachment_idis available here. Please fall back to attachment metadata before callingget_rag_indexing_skip_reason(), and fail closed for Excel if neither size is trustworthy.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/knowledge/indexing.py` around lines 502 - 529, When resolving file metadata before calling get_rag_indexing_skip_reason, don't trust KnowledgeDocument.file_size alone: in the document_id branch (KnowledgeDocument) if document.file_size is None or 0 and attachment_id is present, query SubtaskContext (filtering ContextType.ATTACHMENT) and override file_size/file_extension with the attachment's values; otherwise if after both checks file_size is still missing/zero and source_type indicates Excel treat size as "too large" (fail closed) so get_rag_indexing_skip_reason will skip indexing. Ensure you still populate source_type from KnowledgeDocument when available and only fall back to attachment metadata for file_size/file_extension.
🧹 Nitpick comments (1)
frontend/src/features/knowledge/document/components/DocumentItem.tsx (1)
126-129: Extract the Excel limit/tooltip logic into a shared helper.The 2MB policy is duplicated here in both layouts (
EXCEL_FILE_SIZE_LIMIT,limit: 2, and the repeated size formatting). A small helper/shared constant would keep the compact and table UIs in sync and reduce drift from the backend policy over time.Also applies to: 234-239, 442-447
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx` around lines 126 - 129, The Excel size-limit logic (EXCEL_FILE_SIZE_LIMIT, isExcel, isExcelExceedingSizeLimit and the repeated formatted "2 MB" text) is duplicated across multiple UI components; extract a single shared helper that exports the numeric limit constant (e.g., EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024) and utility functions like isExcel(fileExtension) and isExcelExceedingLimit(file) plus a formattedLimit getter used by tooltips; replace local uses of EXCEL_FILE_SIZE_LIMIT, isExcel and isExcelExceedingSizeLimit in DocumentItem and the other occurrences with imports from that helper so both compact and table UIs share the same source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/tests/services/knowledge/test_orchestrator.py`:
- Around line 422-426: The test
test_excel_documents_within_size_limit_are_allowed never exercises the size
check because it omits the file_size argument; update the test to call
get_rag_indexing_skip_reason("file", "xlsx", file_size=2 * 1024 * 1024) so the
2MB boundary is evaluated and assert the result is None, and optionally add an
additional assertion for a size just above the limit (e.g., file_size=2 * 1024 *
1024 + 1) to verify the skip reason is returned.
In `@frontend/src/features/knowledge/document/components/DocumentList.tsx`:
- Around line 382-400: In the catch block inside DocumentList.tsx that builds
the toast error (the code setting errorMessage and using
t('document.document.reindexFailed')), remove the else branch that assigns the
raw backend message (errorMessage = message) so we do not surface unlocalized
server text; keep the existing EXCEL_FILE_SIZE_EXCEEDED parsing and i18n
mapping, and for any other error leave errorMessage as the generic
t('document.document.reindexFailed') fallback unless you add explicit
error-code-to-i18n mappings later.
---
Outside diff comments:
In `@backend/app/services/knowledge/indexing.py`:
- Around line 502-529: When resolving file metadata before calling
get_rag_indexing_skip_reason, don't trust KnowledgeDocument.file_size alone: in
the document_id branch (KnowledgeDocument) if document.file_size is None or 0
and attachment_id is present, query SubtaskContext (filtering
ContextType.ATTACHMENT) and override file_size/file_extension with the
attachment's values; otherwise if after both checks file_size is still
missing/zero and source_type indicates Excel treat size as "too large" (fail
closed) so get_rag_indexing_skip_reason will skip indexing. Ensure you still
populate source_type from KnowledgeDocument when available and only fall back to
attachment metadata for file_size/file_extension.
---
Nitpick comments:
In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx`:
- Around line 126-129: The Excel size-limit logic (EXCEL_FILE_SIZE_LIMIT,
isExcel, isExcelExceedingSizeLimit and the repeated formatted "2 MB" text) is
duplicated across multiple UI components; extract a single shared helper that
exports the numeric limit constant (e.g., EXCEL_FILE_SIZE_LIMIT = 2 * 1024 *
1024) and utility functions like isExcel(fileExtension) and
isExcelExceedingLimit(file) plus a formattedLimit getter used by tooltips;
replace local uses of EXCEL_FILE_SIZE_LIMIT, isExcel and
isExcelExceedingSizeLimit in DocumentItem and the other occurrences with imports
from that helper so both compact and table UIs share the same source of truth.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 35414538-0e85-41ef-8b70-84e004b3c793
📒 Files selected for processing (7)
backend/app/services/knowledge/indexing.pybackend/app/services/knowledge/orchestrator.pybackend/tests/services/knowledge/test_orchestrator.pyfrontend/src/features/knowledge/document/components/DocumentItem.tsxfrontend/src/features/knowledge/document/components/DocumentList.tsxfrontend/src/i18n/locales/en/knowledge.jsonfrontend/src/i18n/locales/zh-CN/knowledge.json
| def test_excel_documents_within_size_limit_are_allowed(self): | ||
| """Test Excel extensions within 2MB size limit are allowed for RAG indexing.""" | ||
| reason = get_rag_indexing_skip_reason("file", "xlsx") | ||
|
|
||
| assert reason == "Excel documents (.xlsx) are excluded from RAG indexing" | ||
| assert reason is None |
There was a problem hiding this comment.
This test never hits the "within 2MB" branch.
Omitting file_size makes get_rag_indexing_skip_reason() return None regardless of the workbook size, so the new size boundary is not exercised here. Pass 2 * 1024 * 1024 explicitly, and ideally add an exact-boundary assertion.
🧪 Suggested fix
- reason = get_rag_indexing_skip_reason("file", "xlsx")
+ reason = get_rag_indexing_skip_reason("file", "xlsx", 2 * 1024 * 1024)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/services/knowledge/test_orchestrator.py` around lines 422 -
426, The test test_excel_documents_within_size_limit_are_allowed never exercises
the size check because it omits the file_size argument; update the test to call
get_rag_indexing_skip_reason("file", "xlsx", file_size=2 * 1024 * 1024) so the
2MB boundary is evaluated and assert the result is None, and optionally add an
additional assertion for a size just above the limit (e.g., file_size=2 * 1024 *
1024 + 1) to verify the skip reason is returned.
| } catch (err) { | ||
| // Parse error message and use i18n translation if it's a known error code | ||
| let errorMessage = t('document.document.reindexFailed') | ||
| if (err instanceof Error) { | ||
| const message = err.message | ||
| // Check if it's an EXCEL_FILE_SIZE_EXCEEDED error code | ||
| // Format: EXCEL_FILE_SIZE_EXCEEDED|extension|limit|size | ||
| if (message.startsWith('EXCEL_FILE_SIZE_EXCEEDED|')) { | ||
| const parts = message.split('|') | ||
| if (parts.length === 4) { | ||
| errorMessage = t('document.document.excelFileSizeExceeded', { | ||
| extension: parts[1], | ||
| limit: parts[2], | ||
| size: parts[3], | ||
| }) | ||
| } | ||
| } else { | ||
| // Use the original error message for other errors | ||
| errorMessage = message |
There was a problem hiding this comment.
Don't surface raw backend errors in the reindex toast.
Only EXCEL_FILE_SIZE_EXCEEDED is mapped to i18n here. The else branch now shows raw backend messages for every other failure, which regresses localization and leaks server wording straight into user-facing toasts. Keep the generic reindexFailed fallback unless you add explicit error-code mappings for those cases.
💡 Suggested change
- } else {
- // Use the original error message for other errors
- errorMessage = message
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } catch (err) { | |
| // Parse error message and use i18n translation if it's a known error code | |
| let errorMessage = t('document.document.reindexFailed') | |
| if (err instanceof Error) { | |
| const message = err.message | |
| // Check if it's an EXCEL_FILE_SIZE_EXCEEDED error code | |
| // Format: EXCEL_FILE_SIZE_EXCEEDED|extension|limit|size | |
| if (message.startsWith('EXCEL_FILE_SIZE_EXCEEDED|')) { | |
| const parts = message.split('|') | |
| if (parts.length === 4) { | |
| errorMessage = t('document.document.excelFileSizeExceeded', { | |
| extension: parts[1], | |
| limit: parts[2], | |
| size: parts[3], | |
| }) | |
| } | |
| } else { | |
| // Use the original error message for other errors | |
| errorMessage = message | |
| } catch (err) { | |
| // Parse error message and use i18n translation if it's a known error code | |
| let errorMessage = t('document.document.reindexFailed') | |
| if (err instanceof Error) { | |
| const message = err.message | |
| // Check if it's an EXCEL_FILE_SIZE_EXCEEDED error code | |
| // Format: EXCEL_FILE_SIZE_EXCEEDED|extension|limit|size | |
| if (message.startsWith('EXCEL_FILE_SIZE_EXCEEDED|')) { | |
| const parts = message.split('|') | |
| if (parts.length === 4) { | |
| errorMessage = t('document.document.excelFileSizeExceeded', { | |
| extension: parts[1], | |
| limit: parts[2], | |
| size: parts[3], | |
| }) | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frontend/src/features/knowledge/document/components/DocumentList.tsx` around
lines 382 - 400, In the catch block inside DocumentList.tsx that builds the
toast error (the code setting errorMessage and using
t('document.document.reindexFailed')), remove the else branch that assigns the
raw backend message (errorMessage = message) so we do not surface unlocalized
server text; keep the existing EXCEL_FILE_SIZE_EXCEEDED parsing and i18n
mapping, and for any other error leave errorMessage as the generic
t('document.document.reindexFailed') fallback unless you add explicit
error-code-to-i18n mappings later.
| const message = err.message | ||
| // Check if it's an EXCEL_FILE_SIZE_EXCEEDED error code | ||
| // Format: EXCEL_FILE_SIZE_EXCEEDED|extension|limit|size | ||
| if (message.startsWith('EXCEL_FILE_SIZE_EXCEEDED|')) { |
There was a problem hiding this comment.
我刚刚改权限那边,加了个 error_code 好像可以复用
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/services/knowledge/indexing.py (1)
91-114:⚠️ Potential issue | 🟠 MajorMake
file_sizemandatory to avoid Excel size-limit bypass.Line 91 makes
file_sizeoptional; any caller that omits it will bypass the Excel limit check at Line 113. That weakens the 2MB enforcement contract.🔧 Proposed fix
def get_rag_indexing_skip_reason( source_type: Optional[str], file_extension: Optional[str], - file_size: Optional[int] = None, + file_size: int, ) -> Optional[str]: @@ - if normalized_extension in EXCEL_EXTENSIONS: - if file_size is not None and file_size > EXCEL_FILE_SIZE_LIMIT: + if normalized_extension in EXCEL_EXTENSIONS and file_size > EXCEL_FILE_SIZE_LIMIT: size_mb = file_size / (1024 * 1024) limit_mb = EXCEL_FILE_SIZE_LIMIT / (1024 * 1024) return f"EXCEL_FILE_SIZE_EXCEEDED|{normalized_extension}|{limit_mb:.0f}|{size_mb:.2f}"- file_size = None + file_size = 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/knowledge/indexing.py` around lines 91 - 114, The file_size parameter is optional which lets callers omit it and bypass the Excel 2MB check; make file_size mandatory (remove Optional and the default None) in the function signature that returns the skip reason, update all callers to provide the document size, and ensure runtime validation around EXCEL_EXTENSIONS/EXCEL_FILE_SIZE_LIMIT (e.g., raise or return an error/skip reason if size is not provided for Excel files) so the Excel size-limit cannot be bypassed; reference the file_size parameter, EXCEL_EXTENSIONS, EXCEL_FILE_SIZE_LIMIT, and the skip-reason function when making changes.
🧹 Nitpick comments (1)
backend/app/services/knowledge/indexing.py (1)
425-440:run_document_indexingis doing too much in one function.This function mixes lookup, policy checks, indexing orchestration, status persistence, and summary triggering; splitting into smaller helpers will improve maintainability and testability.
As per coding guidelines, "Function length: Max 50 lines per function (preferred)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/knowledge/indexing.py` around lines 425 - 440, run_document_indexing is too large and mixes lookup, policy checks, orchestration, persistence and summary triggering; break it into small helpers: extract the database/KB lookup logic into a function like fetch_kb_and_attachment(knowledge_base_id, attachment_id, db), policy checks into validate_indexing_permissions(user_id, knowledge_base, attachment), indexing orchestration into orchestrate_indexing(...), status persistence into persist_index_status(kb_index_info, status, db), and summary logic into trigger_summary_if_needed(...); then simplify run_document_indexing to call these helpers in sequence, pass only needed parameters, and move any complex branching into the new functions to keep run_document_indexing concise and under ~50 lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@backend/app/services/knowledge/indexing.py`:
- Around line 91-114: The file_size parameter is optional which lets callers
omit it and bypass the Excel 2MB check; make file_size mandatory (remove
Optional and the default None) in the function signature that returns the skip
reason, update all callers to provide the document size, and ensure runtime
validation around EXCEL_EXTENSIONS/EXCEL_FILE_SIZE_LIMIT (e.g., raise or return
an error/skip reason if size is not provided for Excel files) so the Excel
size-limit cannot be bypassed; reference the file_size parameter,
EXCEL_EXTENSIONS, EXCEL_FILE_SIZE_LIMIT, and the skip-reason function when
making changes.
---
Nitpick comments:
In `@backend/app/services/knowledge/indexing.py`:
- Around line 425-440: run_document_indexing is too large and mixes lookup,
policy checks, orchestration, persistence and summary triggering; break it into
small helpers: extract the database/KB lookup logic into a function like
fetch_kb_and_attachment(knowledge_base_id, attachment_id, db), policy checks
into validate_indexing_permissions(user_id, knowledge_base, attachment),
indexing orchestration into orchestrate_indexing(...), status persistence into
persist_index_status(kb_index_info, status, db), and summary logic into
trigger_summary_if_needed(...); then simplify run_document_indexing to call
these helpers in sequence, pass only needed parameters, and move any complex
branching into the new functions to keep run_document_indexing concise and under
~50 lines.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3824eb33-ed79-49c3-8b03-cd8e3a24cd12
📒 Files selected for processing (1)
backend/app/services/knowledge/indexing.py
- Remove .xls and .xlsx from RAG_INDEXING_DISABLED_EXTENSIONS - Add EXCEL_FILE_SIZE_LIMIT (2MB) and EXCEL_EXTENSIONS constants - Add file_size parameter to get_rag_indexing_skip_reason function - Return error code format for i18n support: EXCEL_FILE_SIZE_EXCEEDED|ext|limit|size - Update orchestrator to pass file_size to indexing functions - Update frontend DocumentItem to show 'Indexing' status when reindexing - Show 'Index Unavailable' when not reindexing and is_active=false - Add i18n translations for Excel file size exceeded error (zh-CN/en) - Update tests to verify Excel files within 2MB limit are allowed
…nstant Remove the unused RAG_INDEXING_DISABLED_EXTENSIONS frozenset constant and its associated extension check logic from the indexing service. The constant was empty and served no functional purpose.
…ing in DocumentList Replace string prefix matching with ApiError.errorCode field check for EXCEL_FILE_SIZE_EXCEEDED error handling. This leverages the existing ApiError class from client.ts for more robust and type-safe error handling.
f5c6078 to
d9e8064
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
frontend/src/features/knowledge/document/components/DocumentItem.tsx (1)
268-276: Remove redundant ternary branch in unavailable tooltipAt Line 274 and Line 500, both branches return the same
indexStatus.unavailableHint. This can be simplified.Proposed cleanup
- {isExcelExceedingSizeLimit - ? t('knowledge:document.document.excelFileSizeExceeded', { - extension: document.file_extension, - limit: 2, - size: (document.file_size / (1024 * 1024)).toFixed(2), - }) - : !ragConfigured - ? t('knowledge:document.document.indexStatus.unavailableHint') - : t('knowledge:document.document.indexStatus.unavailableHint')} + {isExcelExceedingSizeLimit + ? t('knowledge:document.document.excelFileSizeExceeded', { + extension: document.file_extension, + limit: 2, + size: (document.file_size / (1024 * 1024)).toFixed(2), + }) + : t('knowledge:document.document.indexStatus.unavailableHint')}Also applies to: 494-503
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx` around lines 268 - 276, The JSX in DocumentItem.tsx uses a redundant nested ternary where both the true and false branches return the same t('knowledge:document.document.indexStatus.unavailableHint') value; update the expression that checks isExcelExceedingSizeLimit and ragConfigured (used in the JSX around the unavailable tooltip) to remove the duplicated branch so it only renders the excel-size message when isExcelExceedingSizeLimit is true, otherwise render t('knowledge:document.document.indexStatus.unavailableHint'); keep the existing variables isExcelExceedingSizeLimit and ragConfigured and the translation key unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx`:
- Around line 149-152: The Excel detection misses dot-prefixed extensions;
update the normalization used by isExcel so document.file_extension is sanitized
(trim, remove leading '.' if present, and toLowerCase()) before checking against
['xls','xlsx']. Keep EXCEL_FILE_SIZE_LIMIT and isExcelExceedingSizeLimit logic
but compute isExcel from the normalizedExtension (derived from
document.file_extension) to ensure '.xlsx' and 'xlsx' are both recognized.
- Around line 149-152: The UI computes EXCEL_FILE_SIZE_LIMIT, isExcel and
isExcelExceedingSizeLimit but never uses it to prevent user actions; update the
DocumentItem action rendering logic (both the compact and normal action areas
where reindex is offered) to check isExcelExceedingSizeLimit and either hide or
disable the reindex action/button (and show an explanatory tooltip/message) when
true so users cannot trigger guaranteed-to-fail reindex requests for oversized
Excel files.
---
Nitpick comments:
In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx`:
- Around line 268-276: The JSX in DocumentItem.tsx uses a redundant nested
ternary where both the true and false branches return the same
t('knowledge:document.document.indexStatus.unavailableHint') value; update the
expression that checks isExcelExceedingSizeLimit and ragConfigured (used in the
JSX around the unavailable tooltip) to remove the duplicated branch so it only
renders the excel-size message when isExcelExceedingSizeLimit is true, otherwise
render t('knowledge:document.document.indexStatus.unavailableHint'); keep the
existing variables isExcelExceedingSizeLimit and ragConfigured and the
translation key unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8cc521b2-8f19-4821-94aa-7094517ba7db
📒 Files selected for processing (7)
backend/app/services/knowledge/indexing.pybackend/app/services/knowledge/orchestrator.pybackend/tests/services/knowledge/test_orchestrator.pyfrontend/src/features/knowledge/document/components/DocumentItem.tsxfrontend/src/features/knowledge/document/components/DocumentList.tsxfrontend/src/i18n/locales/en/knowledge.jsonfrontend/src/i18n/locales/zh-CN/knowledge.json
✅ Files skipped from review due to trivial changes (1)
- frontend/src/i18n/locales/en/knowledge.json
🚧 Files skipped from review as they are similar to previous changes (5)
- frontend/src/i18n/locales/zh-CN/knowledge.json
- backend/app/services/knowledge/orchestrator.py
- frontend/src/features/knowledge/document/components/DocumentList.tsx
- backend/tests/services/knowledge/test_orchestrator.py
- backend/app/services/knowledge/indexing.py
| // Check if Excel file exceeds size limit (2MB) | ||
| const EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024 // 2MB | ||
| const isExcel = ['xls', 'xlsx'].includes(document.file_extension?.toLowerCase() || '') | ||
| const isExcelExceedingSizeLimit = isExcel && document.file_size > EXCEL_FILE_SIZE_LIMIT |
There was a problem hiding this comment.
Normalize extension before Excel detection to avoid false negatives
At Line 151, the check only matches 'xls'/'xlsx'. If document.file_extension is dot-prefixed (e.g. .xlsx, consistent with backend normalization), Excel files will not be recognized and Line 152 won’t flag size overflow.
Proposed fix
- const EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024 // 2MB
- const isExcel = ['xls', 'xlsx'].includes(document.file_extension?.toLowerCase() || '')
- const isExcelExceedingSizeLimit = isExcel && document.file_size > EXCEL_FILE_SIZE_LIMIT
+ const EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024 // 2MB
+ const normalizedExtension = (document.file_extension || '').toLowerCase().replace(/^\./, '')
+ const isExcel = normalizedExtension === 'xls' || normalizedExtension === 'xlsx'
+ const isExcelExceedingSizeLimit =
+ isExcel && typeof document.file_size === 'number' && document.file_size > EXCEL_FILE_SIZE_LIMIT📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Check if Excel file exceeds size limit (2MB) | |
| const EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024 // 2MB | |
| const isExcel = ['xls', 'xlsx'].includes(document.file_extension?.toLowerCase() || '') | |
| const isExcelExceedingSizeLimit = isExcel && document.file_size > EXCEL_FILE_SIZE_LIMIT | |
| // Check if Excel file exceeds size limit (2MB) | |
| const EXCEL_FILE_SIZE_LIMIT = 2 * 1024 * 1024 // 2MB | |
| const normalizedExtension = (document.file_extension || '').toLowerCase().replace(/^\./, '') | |
| const isExcel = normalizedExtension === 'xls' || normalizedExtension === 'xlsx' | |
| const isExcelExceedingSizeLimit = | |
| isExcel && typeof document.file_size === 'number' && document.file_size > EXCEL_FILE_SIZE_LIMIT |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx` around
lines 149 - 152, The Excel detection misses dot-prefixed extensions; update the
normalization used by isExcel so document.file_extension is sanitized (trim,
remove leading '.' if present, and toLowerCase()) before checking against
['xls','xlsx']. Keep EXCEL_FILE_SIZE_LIMIT and isExcelExceedingSizeLimit logic
but compute isExcel from the normalizedExtension (derived from
document.file_extension) to ensure '.xlsx' and 'xlsx' are both recognized.
Use isExcelExceedingSizeLimit to block known-failing reindex actions
You already compute the oversize condition, but reindex actions are still exposed in compact/normal action areas. That allows user-triggered requests that are guaranteed to fail for oversized Excel files.
Proposed fix
+ const canReindex = ragConfigured && !document.is_active && !isTable && !isExcelExceedingSizeLimit
- {ragConfigured && !document.is_active && !isTable && onReindex && (
+ {canReindex && onReindex && (
<DropdownMenuItem onClick={handleReindex} disabled={isReindexing}>
...
</DropdownMenuItem>
)}
- {ragConfigured && !document.is_active && !isTable && onReindex && (
+ {canReindex && onReindex && (
<button
...
>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frontend/src/features/knowledge/document/components/DocumentItem.tsx` around
lines 149 - 152, The UI computes EXCEL_FILE_SIZE_LIMIT, isExcel and
isExcelExceedingSizeLimit but never uses it to prevent user actions; update the
DocumentItem action rendering logic (both the compact and normal action areas
where reindex is offered) to check isExcelExceedingSizeLimit and either hide or
disable the reindex action/button (and show an explanatory tooltip/message) when
true so users cannot trigger guaranteed-to-fail reindex requests for oversized
Excel files.
Summary by CodeRabbit
New Features
Bug Fixes / UX
Tests