feat: Shared embedding cache across workspaces with optional Cloudflare sync

## Problem

When using `ck` across git worktrees or team environments, identical code content is re-embedded repeatedly:

- **Worktrees**: Creating a worktree from main requires full re-indexing (minutes)
- **Team**: Each developer computes the same embeddings locally  
- **CI/CD**: Pipelines start fresh every run
- **Merges**: Merging a feature branch re-computes embeddings that already exist

## Proposed Solution

A content-addressed embedding cache that:

1. **Locally**: SQLite database in a "base" directory, shared across workspaces
2. **Optionally remote**: Cloudflare D1 (metadata) + R2 (blobs) + Durable Objects (coordination)
3. **Safe GC**: Reference counting with global awareness before deletion
4. **Sync**: Monotonic sequence numbers, per-item acknowledgment, streaming for large payloads

## Architecture Overview

```
LOCAL:
  ~/gitspace/project/base/.ck/
    embeddings.db          # SQLite: metadata + blobs + refs
    
  ~/gitspace/project/workspaces/feature-x/.ck/
    config.toml            # base = "../../base"
    manifest.bin.zst       # File → chunk mappings
    ann_index.bin          # HNSW index

CLOUDFLARE (Optional):
  D1: metadata (hash, seq, r2_key, model_id, refs, audit_log)
  R2: embeddings/{hash}.bin (blobs, $0.015/GB/mo, no size limit)
  DO: EmbeddingCoordinator (seq numbers, compute locks, rate limits, heartbeats)
  Worker: /handshake, /embed, /push, /pull, /ref-count, /heartbeat
```

## Key Design Decisions

### Content-Addressed Storage
- Hash = `blake3(model_id || model_version || content)`
- Same content = same hash = reuse embedding
- Model changes = different hash = no accidental mixing

### Reference Counting for Safe GC
```sql
-- Only delete if no local AND no remote refs
DELETE FROM embeddings 
WHERE hash NOT IN (SELECT hash FROM refs)
  AND created_at < unixepoch() - 3600;  -- 1hr grace

-- Before delete, check remote:
POST /api/ref-count { hashes: [...] }
```

### Monotonic Sequence Numbers for Sync
- Durable Object maintains global sequence
- Sync by sequence, not timestamp (avoids clock skew issues)
- Per-item acknowledgment on push (handles partial failures)

### Compute Locks (Cloudflare)
- DO lock prevents duplicate computation when two clients need same embedding
- First acquires lock, computes, stores
- Second waits, gets from cache

## CLI Commands

```bash
# Workspace management
ck init --base ../../base         # Link workspace to base
ck workspace list                 # Show registered workspaces

# Cache management  
ck cache stats                    # Size, hit rate, workspaces
ck cache gc                       # Clean unreferenced embeddings
ck cache gc --dry-run             # Preview deletions

# Cloudflare sync (when configured)
ck cache push                     # Push new embeddings
ck cache pull                     # Pull from remote
ck cache sync                     # Bidirectional

# Diagnostics
ck doctor                         # Comprehensive health check
```

## Migration Path

### Phase 1: Local Shared Cache
- SQLite in base directory
- Reference counting GC
- Workspace registration
- Config: `base = "../../base"`

### Phase 2: Cloudflare Sync
- D1 for metadata, R2 for blobs
- Durable Object for coordination
- Push/pull commands
- Per-user JWT auth

### Phase 3: Team Features
- `ck cloudflare join` onboarding wizard
- `ck doctor` diagnostics
- Usage analytics

## Cost Estimate (Cloudflare, 10 devs, 1M embeddings)

| Resource | Cost |
|----------|------|
| R2 Storage (~1.5GB) | $0.02/mo |
| D1 Storage (~100MB) | $0.08/mo |
| D1 + Workers ops | ~$1/mo |
| Workers AI (optional) | ~$3/mo |
| **Total** | **~$5-20/mo** |

## Open Questions

1. **Embedding format**: Float32 vs quantized Int8 (4x smaller)?
2. **Compression**: Zstd compress in R2?
3. **Model migration**: Tooling when embedding model updates?

## Full RFC

A comprehensive RFC with schemas, Worker code, sync protocol details, and security considerations is available. Happy to share if helpful for discussion.

---

This would significantly improve the workflow for:
- Developers using git worktrees
- Teams sharing codebase understanding
- CI/CD pipelines with warm caches

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Shared embedding cache across workspaces with optional Cloudflare sync #101

Problem

Proposed Solution

Architecture Overview

Key Design Decisions

Content-Addressed Storage

Reference Counting for Safe GC

Monotonic Sequence Numbers for Sync

Compute Locks (Cloudflare)

CLI Commands

Migration Path

Phase 1: Local Shared Cache

Phase 2: Cloudflare Sync

Phase 3: Team Features

Cost Estimate (Cloudflare, 10 devs, 1M embeddings)

Open Questions

Full RFC

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Resource	Cost
R2 Storage (~1.5GB)	$0.02/mo
D1 Storage (~100MB)	$0.08/mo
D1 + Workers ops	~$1/mo
Workers AI (optional)	~$3/mo
Total	~$5-20/mo

Uh oh!

feat: Shared embedding cache across workspaces with optional Cloudflare sync #101

Description

Problem

Proposed Solution

Architecture Overview

Key Design Decisions

Content-Addressed Storage

Reference Counting for Safe GC

Monotonic Sequence Numbers for Sync

Compute Locks (Cloudflare)

CLI Commands

Migration Path

Phase 1: Local Shared Cache

Phase 2: Cloudflare Sync

Phase 3: Team Features

Cost Estimate (Cloudflare, 10 devs, 1M embeddings)

Open Questions

Full RFC

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions