feat: introduce hybrid storage #33

mdhruvil · 2026-01-23T03:41:07Z

This PR rewrites the whole git backend to support R2 + D1 Hybrid Architecture

- add vitest - add stream buffer for exact byte count reading from ReadableStream

coderabbitai · 2026-01-23T03:41:16Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch mdhruvil/hybrid-storage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-01-23T03:44:34Z

Greptile Summary

This PR introduces a custom streaming Git packfile parser as the foundation for a hybrid R2 + D1 storage architecture. The parser handles packfile header parsing, object decompression, delta objects (both OFS and REF types), and SHA-1 checksum verification.

Key Changes:

Core Implementation: New PackfileParser class with streaming architecture using StreamBuffer for memory-efficient chunk processing
Hash Verification: Clever delayed-hashing strategy that excludes the trailing 20-byte checksum from hash computation
Testing: Comprehensive test suite that validates against real git packfiles using git verify-pack
CI/CD: Split CI workflow to run lint/test/typecheck before deployment approval
Documentation: Extensive docs on hybrid storage strategy, cost optimization (92% savings vs pure DO SQLite), and implementation roadmap
Cleanup: Removed unused header component and git submodules

Issues Found:

Critical logic bug in decompression error handling (line 366-373 in apps/web/src/git/pack/index.ts) - the condition for detecting decompression failure is mathematically incorrect and will cause unnecessary retries

Confidence Score: 3/5

This PR requires fixing the decompression error handling bug before merging
The core packfile parser implementation is well-designed with solid test coverage and proper streaming architecture. However, there's a logic bug in the decompression error detection that could cause performance issues by unnecessarily retrying decompression up to 100 times. The bug won't cause data corruption (tests validate correctness), but it could waste CPU cycles and delay error reporting. The rest of the changes (CI, docs, config) are sound.
Fix the decompression logic in apps/web/src/git/pack/index.ts before merging

Important Files Changed

Filename	Overview
apps/web/src/git/pack/index.ts	New streaming packfile parser with SHA-1 verification and delta object support; has decompression error handling bug
apps/web/src/lib/stream-buffer.ts	Memory-efficient streaming buffer with chunk compaction and optional data callbacks
apps/web/src/git/pack/pack.test.ts	Comprehensive test suite validating parser against actual git packfiles using `git verify-pack`
.github/workflows/pr-preview.yml	CI workflow split into separate lint/test/typecheck job before preview deployment
docs/data-storage.md	Architecture documentation for hybrid R2+D1 storage strategy with cost analysis

Sequence Diagram

sequenceDiagram
    participant Client as Git Client
    participant Worker as Cloudflare Worker
    participant Cache as Cache API
    participant DO as Durable Object (SQLite)
    participant R2 as R2 Storage
    participant Parser as PackfileParser

    Client->>Worker: git clone/fetch request
    Worker->>Cache: Check for parsed commit/tree
    alt Cache Hit
        Cache-->>Worker: Return parsed object
        Worker-->>Client: Send object
    else Cache Miss
        Worker->>DO: Query object index (oid→pack_id, offset)
        DO-->>Worker: Return pack location
        Worker->>R2: Range request for object at offset
        R2-->>Worker: Stream compressed packfile chunk
        Worker->>Parser: Create PackfileParser(stream)
        Parser->>Parser: parseHeader() - validate signature
        Parser->>Parser: parseObject() - read type/size
        Parser->>Parser: decompressObject() - inflate zlib
        Parser->>Parser: Verify SHA-1 checksum
        Parser-->>Worker: Return parsed object
        Worker->>Cache: Store parsed object (1 year TTL)
        Worker-->>Client: Send object
    end
    
    Note over Parser: StreamBuffer manages chunks<br/>Memory-efficient compaction<br/>Delayed hash computation

greptile-apps

_{20 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-23T03:44:33Z

apps/web/src/git/pack/index.ts

+        if (
+          this.buffer.bytesRead + bufferSize >=
+          this._progress.bytesRead + bufferSize * 2
+        ) {
+          return Result.err(
+            err instanceof Error ? err : new Error(String(err))
+          );
+        }


logic: the condition logic is incorrect - this.buffer.bytesRead should equal this._progress.bytesRead since they're synchronized (lines 92, 104, 150), making this condition effectively bufferSize >= bufferSize * 2, which is only true when bufferSize <= 0

this means the error branch will almost never execute, and decompression will always retry up to maxAttempts even when more data won't help

Suggested change

if (

this.buffer.bytesRead + bufferSize >=

this._progress.bytesRead + bufferSize * 2

) {

return Result.err(

err instanceof Error ? err : new Error(String(err))

);

}

if (this.done && available.length < bufferSize) {

return Result.err(

err instanceof Error ? err : new Error(String(err))

);

}

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/web/src/git/pack/index.ts Line: 366:373 Comment: **logic:** the condition logic is incorrect - `this.buffer.bytesRead` should equal `this._progress.bytesRead` since they're synchronized (lines 92, 104, 150), making this condition effectively `bufferSize >= bufferSize * 2`, which is only true when `bufferSize <= 0` this means the error branch will almost never execute, and decompression will always retry up to `maxAttempts` even when more data won't help ```suggestion if (this.done && available.length < bufferSize) { return Result.err( err instanceof Error ? err : new Error(String(err)) ); } ``` How can I resolve this? If you propose a fix, please make it concise.

mdhruvil added 3 commits January 22, 2026 09:59

feat: docs

f94e3e5

feat: introduce streaming PackfileParser

801e96d

- add vitest - add stream buffer for exact byte count reading from ReadableStream

ci: add lint, test job before pr preview

544d400

mdhruvil had a problem deploying to preview January 23, 2026 03:41 — with GitHub Actions Failure

greptile-apps bot reviewed Jan 23, 2026

View reviewed changes

mdhruvil marked this pull request as draft January 23, 2026 03:50

mdhruvil added 3 commits January 23, 2026 09:24

fix(pack): early exit decompression retry when stream exhausted

b05bab4

ci: ignore deploy-preview for draft and setup turborepo cache

994666d

chore: update deps

fcbbb51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce hybrid storage #33

feat: introduce hybrid storage #33

mdhruvil commented Jan 23, 2026

Uh oh!

coderabbitai bot commented Jan 23, 2026 •

edited

Loading

Review skipped

Other AI code review bot(s) detected

Uh oh!

greptile-apps bot commented Jan 23, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: introduce hybrid storage #33

Are you sure you want to change the base?

feat: introduce hybrid storage #33

Conversation

mdhruvil commented Jan 23, 2026

Uh oh!

coderabbitai bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Other AI code review bot(s) detected

Uh oh!

greptile-apps bot commented Jan 23, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 23, 2026 •

edited

Loading