perf: add 3-layer early date filtering to skip irrelevant files and entries#869
perf: add 3-layer early date filtering to skip irrelevant files and entries#869pbuchman wants to merge 3 commits intoryoppippi:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds date-range parsing and mtime-based pre-filtering to the ccusage data loader: new helpers (parseSinceDate, filterFilesByMtime, isOutsideDateRange, USAGE_LINE_MARKER), a DateFilter type merged into exported LoadOptions, and entry-level skipping in daily/session/block loaders; tests and package/manifest touched. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
apps/ccusage/src/data-loader.ts (1)
853-860: Consolidate duplicated entry-level date skip logicThe same
dateCompact+since/untilchecks are repeated in three loaders. Extracting one helper reduces drift risk and keeps behavior aligned with future date-range changes.♻️ Suggested consolidation
+function isOutsideDateRange(date: string, since?: string, until?: string): boolean { + const dateCompact = date.substring(0, 10).replace(/-/g, ''); + if (since != null && since !== '' && dateCompact < since) { + return true; + } + if (until != null && until !== '' && dateCompact > until) { + return true; + } + return false; +}-const dateCompact = entryDate.substring(0, 10).replace(/-/g, ''); -if (options?.since != null && dateCompact < options.since) { - return; -} -if (options?.until != null && dateCompact > options.until) { +if (isOutsideDateRange(entryDate, options?.since, options?.until)) { return; }Also applies to: 1031-1039, 1470-1478
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ccusage/src/data-loader.ts` around lines 853 - 860, Extract the duplicated date-range skip logic into a single helper (e.g., isOutsideDateRange or shouldSkipByDate) that accepts the raw entry date string and the loader options and returns a boolean indicating whether to skip; inside the helper compute dateCompact = date.substring(0,10).replace(/-/g,'') and check options?.since and options?.until with proper null/undefined guards, then replace the three inline blocks (the occurrences that compute dateCompact and compare to options.since/options.until) by calling this helper and returning when it indicates the entry is out of range so all loaders (the three places shown) share the same behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 733-740: The current check in filterFilesByMtime (and the similar
block at the later occurrence) only tests for null/undefined (since == null) so
an empty string ("") is treated as a valid cutoff and leads to invalid
parse/comparisons; update the guard to treat empty or whitespace-only since
values as "no filter" (e.g. check if !since or since.trim() === "" before
returning files) and only call parseSinceDate when since is non-empty, ensuring
mtime filtering runs correctly.
- Around line 722-723: The functions parseSinceDate and filterFilesByMtime are
internal-only and should not be part of the module API; remove the export
keyword from their declarations (i.e., change "export function parseSinceDate"
and "export function filterFilesByMtime" to non-exported function declarations)
so they remain usable inside data-loader.ts and by in-file vitest blocks but are
not exported to other modules; verify there are no external imports relying on
these symbols and run type-check/tests.
- Line 16: Tests in apps/ccusage/src/data-loader.ts use forbidden dynamic `await
import()` inside describe callbacks; move those modules to top-level imports
(add `import { utimes } from 'node:fs/promises' to the existing import on line
16 and add `import { createFixture } from 'fs-fixture'` at top), then remove
`async` from the affected describe callbacks and replace each `await import()`
with the corresponding top-level imported symbol usages for the tests named
`filterFilesByMtime`, `mtime + loadDailyUsageData integration`,
`loadSessionUsageById`, `loadSessionData early date filtering`, and
`loadSessionBlockData early date filtering` so no dynamic imports remain.
---
Nitpick comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 853-860: Extract the duplicated date-range skip logic into a
single helper (e.g., isOutsideDateRange or shouldSkipByDate) that accepts the
raw entry date string and the loader options and returns a boolean indicating
whether to skip; inside the helper compute dateCompact =
date.substring(0,10).replace(/-/g,'') and check options?.since and
options?.until with proper null/undefined guards, then replace the three inline
blocks (the occurrences that compute dateCompact and compare to
options.since/options.until) by calling this helper and returning when it
indicates the entry is out of range so all loaders (the three places shown)
share the same behavior.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
apps/ccusage/src/data-loader.ts (1)
1599-1601:⚠️ Potential issue | 🟠 Major
await import()still exists in test blocks and violates repo policy.There are remaining dynamic imports in
describeblocks. This is already a previously reported issue and appears unresolved.#!/bin/bash # Verify no dynamic imports remain in TypeScript files (expected: no matches) rg -nP 'await\s+import\(' --type=tsAs per coding guidelines:
apps/ccusage/**/*.ts: "NEVER useawait import()dynamic imports anywhere (especially in tests)".Also applies to: 4768-4774
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ccusage/src/data-loader.ts` around lines 1599 - 1601, The test contains a dynamic "await import('fs-fixture')" inside the describe block (describe('loadSessionUsageById')) which violates the no-dynamic-import policy; fix it by replacing the dynamic import with a static top-level import (e.g., add "import { createFixture } from 'fs-fixture';" at the top of the file) and remove the "await import" usage inside the describe block so the test uses the statically imported createFixture symbol instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1037-1040: The early per-entry skip using formatDate(...) and
isOutsideDateRange(...) (entryDate) changes session-level filtering semantics
because sessions are later filtered by aggregated lastActivity; instead preserve
previous behavior by removing or deferring the entry-level date skip in the
data-loading path that builds sessions (so mixed-range sessions still include
all entries for correct lastActivity calculation), and ensure session pruning
remains based on the aggregated lastActivity check (the code around
entryDate/isOutsideDateRange should not drop entries used by session
aggregation; adjust the filtering logic where sessions are assembled and keep
the existing lastActivity-based session filter).
- Around line 1471-1474: Remove the per-entry early skip (the formatDate +
isOutsideDateRange check) so identifySessionBlocks receives the full
chronological stream; instead apply the existing date-range filtering to the
constructed session blocks by checking each block.startTime (reuse the current
block-level date checks around block.startTime) so blocks are preserved and not
split/shifted by entry-level drops. Ensure identifySessionBlocks and any callers
still use the original entries array and that the only date filtering happens
after blocks are built using block.startTime.
---
Duplicate comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1599-1601: The test contains a dynamic "await
import('fs-fixture')" inside the describe block
(describe('loadSessionUsageById')) which violates the no-dynamic-import policy;
fix it by replacing the dynamic import with a static top-level import (e.g., add
"import { createFixture } from 'fs-fixture';" at the top of the file) and remove
the "await import" usage inside the describe block so the test uses the
statically imported createFixture symbol instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f2f7af3 to
a837500
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
apps/ccusage/src/data-loader.ts (2)
1479-1482:⚠️ Potential issue | 🟠 MajorEntry-level date skip can reshape session block boundaries
On Line 1479–Line 1482, removing entries before
identifySessionBlocks(...)can split/shift blocks near boundaries, then block-level filtering operates on already-altered blocks.Suggested fix
- const date = formatDate(data.timestamp, options?.timezone, DEFAULT_LOCALE); - if (isOutsideDateRange(date, options?.since, options?.until)) { - return; - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ccusage/src/data-loader.ts` around lines 1479 - 1482, Currently individual entries are being filtered by date before calling identifySessionBlocks (the formatDate + isOutsideDateRange check), which can split or shift session blocks near the boundaries; instead, stop dropping entries before identifySessionBlocks: pass the full set of entries into identifySessionBlocks (reference identifySessionBlocks and the variables formatDate, isOutsideDateRange, options?.since, options?.until, DEFAULT_LOCALE), then after session blocks are constructed apply date-based filtering at the block level (e.g., drop or trim session blocks whose timestamps fall outside options?.since/options?.until) so block boundaries remain correct.
1044-1047:⚠️ Potential issue | 🟠 MajorEntry-level date skip changes session aggregation semantics
On Line 1044–Line 1047, filtering entries before session aggregation can alter totals for sessions whose
lastActivityis in range but contain earlier/later entries outside range. This is a behavior change versus filtering sessions by aggregatedlastActivity.Suggested fix
- const entryDate = formatDate(data.timestamp, options?.timezone, DEFAULT_LOCALE); - if (isOutsideDateRange(entryDate, options?.since, options?.until)) { - return; - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ccusage/src/data-loader.ts` around lines 1044 - 1047, The current early-return using formatDate(data.timestamp) and isOutsideDateRange(entryDate, options?.since, options?.until) skips individual entries and changes session totals; instead stop filtering at the entry level—remove or defer this check in the entries processing loop and let all entries contribute to session aggregation, then after sessions are aggregated filter sessions by their aggregated lastActivity (e.g., call formatDate(session.lastActivity, options?.timezone, DEFAULT_LOCALE) and isOutsideDateRange(...) against options?.since/options?.until) so totals remain correct while still excluding out-of-range sessions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 1479-1482: Currently individual entries are being filtered by date
before calling identifySessionBlocks (the formatDate + isOutsideDateRange
check), which can split or shift session blocks near the boundaries; instead,
stop dropping entries before identifySessionBlocks: pass the full set of entries
into identifySessionBlocks (reference identifySessionBlocks and the variables
formatDate, isOutsideDateRange, options?.since, options?.until, DEFAULT_LOCALE),
then after session blocks are constructed apply date-based filtering at the
block level (e.g., drop or trim session blocks whose timestamps fall outside
options?.since/options?.until) so block boundaries remain correct.
- Around line 1044-1047: The current early-return using
formatDate(data.timestamp) and isOutsideDateRange(entryDate, options?.since,
options?.until) skips individual entries and changes session totals; instead
stop filtering at the entry level—remove or defer this check in the entries
processing loop and let all entries contribute to session aggregation, then
after sessions are aggregated filter sessions by their aggregated lastActivity
(e.g., call formatDate(session.lastActivity, options?.timezone, DEFAULT_LOCALE)
and isOutsideDateRange(...) against options?.since/options?.until) so totals
remain correct while still excluding out-of-range sessions.
Remove await import() dynamic imports from test blocks (repo policy). Remove entry-level date skip from session/block loaders to preserve aggregation semantics — sessions and blocks are filtered post-aggregation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ccusage/src/data-loader.ts`:
- Around line 726-733: The current guard in filterFilesByMtime accepts any
8-digit string, so validate that the parsed date is a real calendar date before
enabling mtime pre-filter: call parseSinceDate(since) into sinceDate, then
verify sinceDate is valid (e.g., !isNaN(sinceDate.getTime())) and that its
year/month/day match the numeric components parsed from the original since
string (so 20230231 is rejected). If validation fails, return files without
applying the cutoff; only subtract one day and compute sinceMs when the parsed
date is confirmed valid.
Reject impossible dates like 20230231 that pass the 8-digit regex but would produce an unintended cutoff via Date rollover. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
That is an excellent addition. Thank you for both, letting us know in OpenUsage and submitting the PR here. I hope this can be merged soon. |
Summary
USAGE_LINE_MARKER) to skipJSON.parseon non-usage linesloadDailyUsageData,loadSessionData, andloadSessionBlockDataProblem
When
--since/--untilare provided, ccusage currently:This means a single-day query takes the same time as a full scan.
Benchmarks
Machine: Apple M2 Pro, 32 GB RAM, macOS 15.6
Dataset: 4,637 JSONL files, 572 MB total (largest file: 469 MB)
--since 20260227 --until 20260227)--since 20260220)The mtime filter provides the biggest wins for short date ranges (the most common use case for integrations like OpenUsage). The substring pre-check also improves full-scan performance by ~14% by skipping
JSON.parseon non-usage lines.How it works
1. File-level mtime pre-filter (
filterFilesByMtime)Before reading any file content, checks each file's
mtimeagainst the--sincedate (with 1-day buffer for timezone safety). Files whosemtimeis older thansince - 1 dayare skipped entirely.2. Line-level substring pre-check (
USAGE_LINE_MARKER)Before calling
JSON.parseon each line, checks whether the line contains the"costUSD"substring (theUSAGE_LINE_MARKER). In typical JSONL files, ~91.5% of lines are non-usage entries (conversation turns, system messages, etc.) that don't contain cost data. This skips expensive JSON parsing for those lines entirely.3. Entry-level early date skip
Inside the per-line processing loop, after
formatDatebut before the expensivecalculateCostForEntry, compares the entry's date againstsince/untiland skips out-of-range entries. Uses the same comparison logic as the existingfilterByDateRangein_date-utils.ts.Backwards compatibility
This is a backwards-compatible change:
filterByDateRange--since/--untilare not providedstat()are kept (not filtered), ensuring no data loss on permission issuesContext
Discovered while debugging slow load times in OpenUsage, a menu bar app that uses ccusage to display Claude Code token usage. With 15-second timeouts per runner, ccusage always times out on machines with heavy usage, causing 60+ second delays. See: robinebers/openusage#249
Testing
parseSinceDate(YYYYMMDD parsing, edge cases)filterFilesByMtime(mtime edge cases, stat errors, buffer behavior)USAGE_LINE_MARKERsubstring pre-check (true/false positives, edge cases)utimes()pnpm run format && pnpm typecheck && pnpm run testSummary by CodeRabbit
New Features
Improvements
Tests