Skip to content

Conversation

@mdhruvil
Copy link
Owner

This PR rewrites the whole git backend to support R2 + D1 Hybrid Architecture

- add vitest
- add stream buffer for exact byte count reading from ReadableStream
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mdhruvil/hybrid-storage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link

greptile-apps bot commented Jan 23, 2026

Greptile Summary

This PR introduces a custom streaming Git packfile parser as the foundation for a hybrid R2 + D1 storage architecture. The parser handles packfile header parsing, object decompression, delta objects (both OFS and REF types), and SHA-1 checksum verification.

Key Changes:

  • Core Implementation: New PackfileParser class with streaming architecture using StreamBuffer for memory-efficient chunk processing
  • Hash Verification: Clever delayed-hashing strategy that excludes the trailing 20-byte checksum from hash computation
  • Testing: Comprehensive test suite that validates against real git packfiles using git verify-pack
  • CI/CD: Split CI workflow to run lint/test/typecheck before deployment approval
  • Documentation: Extensive docs on hybrid storage strategy, cost optimization (92% savings vs pure DO SQLite), and implementation roadmap
  • Cleanup: Removed unused header component and git submodules

Issues Found:

  • Critical logic bug in decompression error handling (line 366-373 in apps/web/src/git/pack/index.ts) - the condition for detecting decompression failure is mathematically incorrect and will cause unnecessary retries

Confidence Score: 3/5

  • This PR requires fixing the decompression error handling bug before merging
  • The core packfile parser implementation is well-designed with solid test coverage and proper streaming architecture. However, there's a logic bug in the decompression error detection that could cause performance issues by unnecessarily retrying decompression up to 100 times. The bug won't cause data corruption (tests validate correctness), but it could waste CPU cycles and delay error reporting. The rest of the changes (CI, docs, config) are sound.
  • Fix the decompression logic in apps/web/src/git/pack/index.ts before merging

Important Files Changed

Filename Overview
apps/web/src/git/pack/index.ts New streaming packfile parser with SHA-1 verification and delta object support; has decompression error handling bug
apps/web/src/lib/stream-buffer.ts Memory-efficient streaming buffer with chunk compaction and optional data callbacks
apps/web/src/git/pack/pack.test.ts Comprehensive test suite validating parser against actual git packfiles using git verify-pack
.github/workflows/pr-preview.yml CI workflow split into separate lint/test/typecheck job before preview deployment
docs/data-storage.md Architecture documentation for hybrid R2+D1 storage strategy with cost analysis

Sequence Diagram

sequenceDiagram
    participant Client as Git Client
    participant Worker as Cloudflare Worker
    participant Cache as Cache API
    participant DO as Durable Object (SQLite)
    participant R2 as R2 Storage
    participant Parser as PackfileParser

    Client->>Worker: git clone/fetch request
    Worker->>Cache: Check for parsed commit/tree
    alt Cache Hit
        Cache-->>Worker: Return parsed object
        Worker-->>Client: Send object
    else Cache Miss
        Worker->>DO: Query object index (oid→pack_id, offset)
        DO-->>Worker: Return pack location
        Worker->>R2: Range request for object at offset
        R2-->>Worker: Stream compressed packfile chunk
        Worker->>Parser: Create PackfileParser(stream)
        Parser->>Parser: parseHeader() - validate signature
        Parser->>Parser: parseObject() - read type/size
        Parser->>Parser: decompressObject() - inflate zlib
        Parser->>Parser: Verify SHA-1 checksum
        Parser-->>Worker: Return parsed object
        Worker->>Cache: Store parsed object (1 year TTL)
        Worker-->>Client: Send object
    end
    
    Note over Parser: StreamBuffer manages chunks<br/>Memory-efficient compaction<br/>Delayed hash computation
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 366 to 373
if (
this.buffer.bytesRead + bufferSize >=
this._progress.bytesRead + bufferSize * 2
) {
return Result.err(
err instanceof Error ? err : new Error(String(err))
);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: the condition logic is incorrect - this.buffer.bytesRead should equal this._progress.bytesRead since they're synchronized (lines 92, 104, 150), making this condition effectively bufferSize >= bufferSize * 2, which is only true when bufferSize <= 0

this means the error branch will almost never execute, and decompression will always retry up to maxAttempts even when more data won't help

Suggested change
if (
this.buffer.bytesRead + bufferSize >=
this._progress.bytesRead + bufferSize * 2
) {
return Result.err(
err instanceof Error ? err : new Error(String(err))
);
}
if (this.done && available.length < bufferSize) {
return Result.err(
err instanceof Error ? err : new Error(String(err))
);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/web/src/git/pack/index.ts
Line: 366:373

Comment:
**logic:** the condition logic is incorrect - `this.buffer.bytesRead` should equal `this._progress.bytesRead` since they're synchronized (lines 92, 104, 150), making this condition effectively `bufferSize >= bufferSize * 2`, which is only true when `bufferSize <= 0`

this means the error branch will almost never execute, and decompression will always retry up to `maxAttempts` even when more data won't help

```suggestion
        if (this.done && available.length < bufferSize) {
          return Result.err(
            err instanceof Error ? err : new Error(String(err))
          );
        }
```

How can I resolve this? If you propose a fix, please make it concise.

@mdhruvil mdhruvil marked this pull request as draft January 23, 2026 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants