You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a WhatsApp IUserFrontend implementation so RockBot can act as a conversational agent over WhatsApp for a single authorized user, using the WhatsApp Web (linked-device) protocol via Baileys — the same approach nanobot uses.
Sibling to #295 (Discord user-proxy). Replaces the previously-filed #435 (which sketched Meta's official Cloud API — wrong approach for this use case).
Motivation
A colleague asked for WhatsApp support that behaves "like nanobot or OpenClaw do today." Investigating nanobot's implementation revealed they use a Node.js bridge running @whiskeysockets/baileys, with QR-code linked-device auth against the user's personal WhatsApp number. No Meta business account, no message templates, no 24-hour reply window — and groups, media, and voice "just work."
The Cloud API path (#435) was the wrong shape for this ask. It requires a separate business phone number, business verification, and pre-approved templates for any agent-initiated message older than 24h. Useful for a different use case, not this one.
Trade-offs (explicit, eyes-open)
Pros
Free; no Meta Business setup, no verification, no per-message billing
Personal phone number is the bot's identity — the user sees WhatsApp messages from "themselves" or whatever number they linked
Full bidirectional chat including DMs, groups, media, voice (with transcription)
No 24-hour window / message-template constraint
Architecturally clean: bridge subprocess isolates the protocol library from the agent core
Cons (these are real — accept them or don't build this)
Against WhatsApp's Terms of Service. Accounts using Baileys have been banned. Risk is unpredictable but non-zero
Linked-device sessions can be revoked by WhatsApp at any time, forcing re-scan
Baileys is reverse-engineered — protocol changes upstream can break the bridge until Baileys catches up
Adds a Node.js >=20 runtime dependency to the userproxy container
Connects to the local bridge over WebSocket on startup, sends {type:"auth", token: <shared secret>}
Outbound: DisplayReplyAsync → JSON send frame to the bridge
Inbound: listens for message frames, applies the sender allowlist, publishes UserMessage to user.message on the bus — same path CLI/Blazor frontends use today
Handles both @s.whatsapp.net (legacy phone) and @lid.whatsapp.net (new LID) identity formats, with LID→phone mapping like nanobot does
1000-entry LRU for message-ID dedup
Auto-reconnect with backoff if the bridge connection drops
Outbound voice / audio synthesis — text and media only
Acceptance criteria
New Node.js bridge/ project using @whiskeysockets/baileys, building to a deployable artifact
Local-only WebSocket server in the bridge with shared-token auth
New RockBot.UserProxy.WhatsApp project implementing IUserFrontend
Outbound: agent replies sent via bridge send / send_media frames
Inbound: bridge message frames forwarded to user.message after allowlist filter
Messages from unauthorized phone numbers are dropped (never reach the bus)
Bus events tagged with source: whatsapp + principal: <phone> metadata
LID and legacy phone JID formats both handled; LID→phone mapping cached
Message-ID dedup so re-delivered frames don't double-process
Voice messages transcribed before reaching the agent (via existing transcription path if one exists, otherwise document the gap)
Auto-reconnect to the bridge with backoff
Unit tests for the .NET frontend (mock WebSocket / bridge)
Dockerfile + Helm wiring including PVC for whatsapp-auth/
README documenting linked-device setup (QR scan), ToS risk acknowledgement, secret configuration, group policy
Open questions
Bridge implementation language — Node.js (matches nanobot, Baileys is the reference implementation) vs. trying to port to a .NET WhatsApp Web client. Node is the pragmatic answer; the bridge is small and contained
Voice transcription — does RockBot already have a transcription path the bridge could call out to, or do we add one? Nanobot transcribes server-side before invoking the agent
Single pod vs. two pods — sidecar pattern (one pod, two containers) keeps the loopback boundary trivial; two pods would expose the bridge token over the pod network. Lean sidecar
Session loss handling — when WhatsApp revokes the linked device, the bridge needs to surface that loudly. Should this fail the pod's readiness probe so the user notices, or silently log and serve an error via DisplayErrorAsync? Probably readiness-fail
ToS acceptance — should this require an explicit consentAcknowledged: true config flag (like nanobot's email channel) before the bridge will start, forcing the deployer to confirm they understand the ban risk?
Summary
Add a WhatsApp
IUserFrontendimplementation so RockBot can act as a conversational agent over WhatsApp for a single authorized user, using the WhatsApp Web (linked-device) protocol via Baileys — the same approach nanobot uses.Sibling to #295 (Discord user-proxy). Replaces the previously-filed #435 (which sketched Meta's official Cloud API — wrong approach for this use case).
Motivation
A colleague asked for WhatsApp support that behaves "like nanobot or OpenClaw do today." Investigating nanobot's implementation revealed they use a Node.js bridge running
@whiskeysockets/baileys, with QR-code linked-device auth against the user's personal WhatsApp number. No Meta business account, no message templates, no 24-hour reply window — and groups, media, and voice "just work."The Cloud API path (#435) was the wrong shape for this ask. It requires a separate business phone number, business verification, and pre-approved templates for any agent-initiated message older than 24h. Useful for a different use case, not this one.
Trade-offs (explicit, eyes-open)
Pros
Cons (these are real — accept them or don't build this)
Design
Two-process architecture mirroring nanobot:
bridge/— Node.js subprocess (new)@whiskeysockets/baileys, the WhatsApp Web protocol, and the linked-device session state on disk{type:"send", to, text},{type:"send_media", to, filePath, mimetype, fileName}{type:"message", pn, sender, content, id, isGroup, wasMentioned, media[], timestamp}{type:"auth", token},{type:"status", status},{type:"qr", ...},{type:"error", ...}src/RockBot.UserProxy.WhatsApp/— .NET frontend (new)Implements
IUserFrontend:{type:"auth", token: <shared secret>}DisplayReplyAsync→ JSONsendframe to the bridgemessageframes, applies the sender allowlist, publishesUserMessagetouser.messageon the bus — same path CLI/Blazor frontends use today@s.whatsapp.net(legacy phone) and@lid.whatsapp.net(new LID) identity formats, with LID→phone mapping like nanobot doesConfiguration (
WhatsAppOptions)BridgeUrl(defaultws://localhost:3001) — bridge WebSocket endpointBridgeToken— shared secret for bridge auth (generated on first run if blank, persisted to a config file mounted from a Kubernetes Secret)AuthorizedPhoneNumbers— E.164 allowlist (e.g.["+1234567890"]); messages from anything else dropped pre-busGroupPolicy—Open(respond to all group messages) orMention(only when @mentioned). Direct messages always respond if sender is allowedMediaDownloadPath— where the bridge writes inbound media (PVC mount)Least-privilege / "nothing trusts the LLM" alignment
UserMessagesource: whatsapp+principal: <phone>(mirrors Discord user-proxy frontend for private-channel real-time chat/notifications #295)Deployment
deploy/helm/rockbot/, runs as its own pod — single pod, two containers:RockBot.UserProxy.WhatsAppdeploy/Dockerfile.whatsapp-bridge,deploy/Dockerfile.userproxy-whatsapp/data/whatsapp-auth) so a pod restart doesn't force re-scanning the QR. Same PVC also hosts inbound mediakubectl logs)Out of scope (future issues)
Acceptance criteria
bridge/project using@whiskeysockets/baileys, building to a deployable artifactRockBot.UserProxy.WhatsAppproject implementingIUserFrontendsend/send_mediaframesmessageframes forwarded touser.messageafter allowlist filtersource: whatsapp+principal: <phone>metadatawhatsapp-auth/Open questions
DisplayErrorAsync? Probably readiness-failconsentAcknowledged: trueconfig flag (like nanobot's email channel) before the bridge will start, forcing the deployer to confirm they understand the ban risk?