WhatsApp user-proxy frontend via Baileys bridge (linked-device, nanobot-style)

## Summary

Add a WhatsApp `IUserFrontend` implementation so RockBot can act as a conversational agent over WhatsApp for a single authorized user, using the **WhatsApp Web (linked-device) protocol via Baileys** — the same approach [nanobot](https://github.com/HKUDS/nanobot) uses.

Sibling to #295 (Discord user-proxy). Replaces the previously-filed #435 (which sketched Meta's official Cloud API — wrong approach for this use case).

## Motivation

A colleague asked for WhatsApp support that behaves "like nanobot or OpenClaw do today." Investigating nanobot's implementation revealed they use a Node.js bridge running `@whiskeysockets/baileys`, with QR-code linked-device auth against the user's **personal** WhatsApp number. No Meta business account, no message templates, no 24-hour reply window — and groups, media, and voice "just work."

The Cloud API path (#435) was the wrong shape for this ask. It requires a separate business phone number, business verification, and pre-approved templates for any agent-initiated message older than 24h. Useful for a different use case, not this one.

## Trade-offs (explicit, eyes-open)

**Pros**
- Free; no Meta Business setup, no verification, no per-message billing
- Personal phone number is the bot's identity — the user sees WhatsApp messages from "themselves" or whatever number they linked
- Full bidirectional chat including DMs, groups, media, voice (with transcription)
- No 24-hour window / message-template constraint
- Architecturally clean: bridge subprocess isolates the protocol library from the agent core

**Cons** (these are real — accept them or don't build this)
- **Against WhatsApp's Terms of Service.** Accounts using Baileys have been banned. Risk is unpredictable but non-zero
- Linked-device sessions can be revoked by WhatsApp at any time, forcing re-scan
- Baileys is reverse-engineered — protocol changes upstream can break the bridge until Baileys catches up
- Adds a Node.js >=20 runtime dependency to the userproxy container
- Same out-of-scope rationale that excluded Signal from #295 ("signal-cli is unofficial and fragile") arguably applies here. Decide whether the value justifies the relaxation

## Design

Two-process architecture mirroring nanobot:

### `bridge/` — Node.js subprocess (new)

- Standalone TypeScript project, builds to a static artifact bundled with the userproxy image
- Owns `@whiskeysockets/baileys`, the WhatsApp Web protocol, and the linked-device session state on disk
- Exposes a local-only WebSocket server (loopback bind, shared-secret auth token)
- Simple JSON protocol — both directions:
  - Outbound `{type:"send", to, text}`, `{type:"send_media", to, filePath, mimetype, fileName}`
  - Inbound `{type:"message", pn, sender, content, id, isGroup, wasMentioned, media[], timestamp}`
  - Lifecycle: `{type:"auth", token}`, `{type:"status", status}`, `{type:"qr", ...}`, `{type:"error", ...}`
- On first run: prints QR to stdout for the user to scan with WhatsApp → Settings → Linked Devices

### `src/RockBot.UserProxy.WhatsApp/` — .NET frontend (new)

Implements `IUserFrontend`:
- Connects to the local bridge over WebSocket on startup, sends `{type:"auth", token: <shared secret>}`
- **Outbound**: `DisplayReplyAsync` → JSON `send` frame to the bridge
- **Inbound**: listens for `message` frames, applies the sender allowlist, publishes `UserMessage` to `user.message` on the bus — same path CLI/Blazor frontends use today
- Handles both `@s.whatsapp.net` (legacy phone) and `@lid.whatsapp.net` (new LID) identity formats, with LID→phone mapping like nanobot does
- 1000-entry LRU for message-ID dedup
- Auto-reconnect with backoff if the bridge connection drops

### Configuration (`WhatsAppOptions`)

- `BridgeUrl` (default `ws://localhost:3001`) — bridge WebSocket endpoint
- `BridgeToken` — shared secret for bridge auth (generated on first run if blank, persisted to a config file mounted from a Kubernetes Secret)
- `AuthorizedPhoneNumbers` — E.164 allowlist (e.g. `["+1234567890"]`); messages from anything else dropped pre-bus
- `GroupPolicy` — `Open` (respond to all group messages) or `Mention` (only when @mentioned). Direct messages always respond if sender is allowed
- `MediaDownloadPath` — where the bridge writes inbound media (PVC mount)

### Least-privilege / "nothing trusts the LLM" alignment

- Bridge subprocess is the **only** component with the Baileys dependency and the linked-device session — minimal Baileys blast radius
- Bridge binds loopback-only and requires a token; .NET frontend speaks only JSON to it
- Sender filtering happens in the .NET frontend **before** bus publish — unauthorized numbers never produce a `UserMessage`
- Outbound bus events carry `source: whatsapp` + `principal: <phone>` (mirrors #295)
- If the bridge crashes or gets revoked by WhatsApp, blast radius is one process and one linked-device session

## Deployment

- New Helm subchart in `deploy/helm/rockbot/`, runs as its own pod — single pod, two containers:
  - **bridge** — Node.js container running the WhatsApp bridge
  - **frontend** — .NET container running `RockBot.UserProxy.WhatsApp`
  - Bridge and frontend communicate over loopback inside the pod
- New Dockerfile per container: `deploy/Dockerfile.whatsapp-bridge`, `deploy/Dockerfile.userproxy-whatsapp`
- **PVC required** for bridge session state (`/data/whatsapp-auth`) so a pod restart doesn't force re-scanning the QR. Same PVC also hosts inbound media
- **No public ingress needed** — bridge dials WhatsApp servers outbound; no webhook
- **First-run QR flow**:
  - Bridge container starts and prints the QR to its stdout (`kubectl logs`)
  - User opens WhatsApp → Settings → Linked Devices → scans QR
  - Session persists to PVC; subsequent restarts skip the QR step

## Out of scope (future issues)

- **Multi-user** — single authorized phone number only; multi-tenant needs per-principal isolation
- **Message templates / Cloud API** — see closed #435 if proactive >24h alerts ever become wanted
- **Rich features** — read receipts, typing indicators, reactions, interactive list/button messages
- **Outbound voice / audio synthesis** — text and media only

## Acceptance criteria

- [ ] New Node.js `bridge/` project using `@whiskeysockets/baileys`, building to a deployable artifact
- [ ] Local-only WebSocket server in the bridge with shared-token auth
- [ ] New `RockBot.UserProxy.WhatsApp` project implementing `IUserFrontend`
- [ ] Outbound: agent replies sent via bridge `send` / `send_media` frames
- [ ] Inbound: bridge `message` frames forwarded to `user.message` after allowlist filter
- [ ] Messages from unauthorized phone numbers are dropped (never reach the bus)
- [ ] Bus events tagged with `source: whatsapp` + `principal: <phone>` metadata
- [ ] LID and legacy phone JID formats both handled; LID→phone mapping cached
- [ ] Message-ID dedup so re-delivered frames don't double-process
- [ ] Voice messages transcribed before reaching the agent (via existing transcription path if one exists, otherwise document the gap)
- [ ] Auto-reconnect to the bridge with backoff
- [ ] Unit tests for the .NET frontend (mock WebSocket / bridge)
- [ ] Dockerfile + Helm wiring including PVC for `whatsapp-auth/`
- [ ] README documenting linked-device setup (QR scan), ToS risk acknowledgement, secret configuration, group policy

## Open questions

- **Bridge implementation language** — Node.js (matches nanobot, Baileys is the reference implementation) vs. trying to port to a .NET WhatsApp Web client. Node is the pragmatic answer; the bridge is small and contained
- **Voice transcription** — does RockBot already have a transcription path the bridge could call out to, or do we add one? Nanobot transcribes server-side before invoking the agent
- **Single pod vs. two pods** — sidecar pattern (one pod, two containers) keeps the loopback boundary trivial; two pods would expose the bridge token over the pod network. Lean sidecar
- **Session loss handling** — when WhatsApp revokes the linked device, the bridge needs to surface that loudly. Should this fail the pod's readiness probe so the user notices, or silently log and serve an error via `DisplayErrorAsync`? Probably readiness-fail
- **ToS acceptance** — should this require an explicit `consentAcknowledged: true` config flag (like nanobot's email channel) before the bridge will start, forcing the deployer to confirm they understand the ban risk?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WhatsApp user-proxy frontend via Baileys bridge (linked-device, nanobot-style) #436

Summary

Motivation

Trade-offs (explicit, eyes-open)

Design

`bridge/` — Node.js subprocess (new)

`src/RockBot.UserProxy.WhatsApp/` — .NET frontend (new)

Configuration (`WhatsAppOptions`)

Least-privilege / "nothing trusts the LLM" alignment

Deployment

Out of scope (future issues)

Acceptance criteria

Open questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

WhatsApp user-proxy frontend via Baileys bridge (linked-device, nanobot-style) #436

Description

Summary

Motivation

Trade-offs (explicit, eyes-open)

Design

bridge/ — Node.js subprocess (new)

src/RockBot.UserProxy.WhatsApp/ — .NET frontend (new)

Configuration (WhatsAppOptions)

Least-privilege / "nothing trusts the LLM" alignment

Deployment

Out of scope (future issues)

Acceptance criteria

Open questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`bridge/` — Node.js subprocess (new)

`src/RockBot.UserProxy.WhatsApp/` — .NET frontend (new)

Configuration (`WhatsAppOptions`)