DroidClaw

Personal AI assistant on Android — agent loop + tool calling + dual Chat & Telegram interface

Why Native Mobile? The Case for AI in Your Pocket

For an AI to graduate from "gadget you query" to true personal assistant, the transition from desktop to native mobile isn't just logical — it's indispensable. A browser tab on a phone is the gap between a work tool and a life companion.

Ubiquity: The assistant must be where life happens

The desktop is a sedentary workspace. You go there to produce, code, or write. But most needs for assistance arise when you're not in front of a 27-inch screen — in the street, in the kitchen, mid-conversation, or at the grocery store.

Desktop: a destination you visit.
Mobile: an extension of yourself. An assistant that stays on a computer is a part-time assistant.

Native vs. Web: A question of friction

Nobody wants a web interface on their phone. The gap in performance and integration is brutal:

Responsiveness: a native app uses local device resources. Waiting for a web page to load, dealing with cookies and page refreshes — this kills the instantaneity a voice command requires.
System integration: a web page is trapped in its tab. A native app interacts with your contacts, calendar, reminders, and — crucially — your sensors (GPS, accelerometer, camera).
Always-on mode: only native technology enables voice activation ("wake word") or lock-screen access. Nobody will unlock their phone, open a browser, type a URL, and wait for it to load just to say "remind me to buy bread."

Contextual AI: Seeing what you see

The great leap of AI in 2026 is real-time multimodality. For AI to be truly useful, it must be able to use the camera to identify an object in front of you, or read what's displayed on your screen to help you in another app.

A browser is a silo. It is blind to what's happening on your phone. A native app is the operating system of your digital life.

UX: Habits and comfort

Mobile web ergonomics are often a clumsy adaptation of desktop:

Touch latency: the web is less fluid than native.
Biometrics: instant access via fingerprint or FaceID is seamless in native, often painful in a browser.

In summary

	Web Assistant (Browser)	Native Assistant (App)
Speed	Depends on network and web engine	Instant (local resources)
Sensor access	Very limited	Full (GPS, camera, mic)
Interaction	Text and click only	Voice, gesture, vision
Availability	Must open browser first	Always running in background

The desktop remains king for complex content creation, but the smartphone is the throne of execution and assistance. An assistant that isn't seamlessly "in your pocket" is just another tool — not a companion.

The AI Sovereignty Problem

Today's AI assistants follow a model of digital feudalism: platforms own the "harvest" — your data, your preferences, your conversation history, your memory. You interact with AI through corporate-controlled interfaces where your context becomes a platform asset, not yours.

This creates three fundamental problems:

Data leakage by design — every prompt, every conversation, every tool result flows to a third-party server. Your AI knows your schedule, your contacts, your location, your habits. And it belongs to someone else.
Vendor lock-in — switching AI providers means losing your history, your memory, your workflows. The more you use one platform, the harder it becomes to leave.
No autonomy — you can't run tasks in the background, schedule prompts, or chain tools together. You're limited to what the platform allows, when it allows it.

Why DroidClaw Exists

DroidClaw is built on a different premise: your AI assistant should be sovereign infrastructure that you own and control.

This means three things:

Hardware autonomy — everything runs on your phone. LLM API calls, tool execution, session management, scheduled tasks. No DroidClaw server. No middleware. Your phone is the server.
Provider freedom — switch between Anthropic, OpenAI, Gemini, Groq, or OpenRouter at any time. Your memory, sessions, and tools stay intact. Zero vendor lock-in.
Context ownership — conversation history, long-term memory, and AI context are stored locally as sovereign assets. They belong to you, not to a platform.

How DroidClaw Works

DroidClaw is not a chatbot. It is an agentic AI assistant — it doesn't just respond, it acts. It reasons iteratively, calls tools, observes results, and loops until the task is solved.

User prompt
    --> LLM reasons (Local LLM or Anthropic / OpenAI / Gemini / Groq)
        --> Tool call (GPS, web search, calendar, files, transit...)
            --> Result fed back to LLM
                --> LLM reasons again, calls another tool if needed
                    --> Final response to user

The agent has access to 28 tools — from web search and file management to GPS location, public transit routing, weather forecasts, calendar access, OCR, knowledge graph, radio streaming, and more. Each tool produces a dual result: raw data for the AI to reason over, and a clean summary for the user to read.

The app survives Android's aggressive battery management through a dual-isolate architecture: the main app handles the UI, while an autonomous foreground service runs scheduled tasks and Telegram bot polling — even when Android kills the main app overnight.

A Telegram bot turns your phone into a remote AI server accessible from any device — PC, tablet, another phone — with zero external infrastructure. The phone polls Telegram directly via long polling, no webhook, no public IP needed.

    +------------------+
    |   Your Phone     |
    |                  |         +------------------+
    |  +-----------+   | <----> |   LLM APIs       |
    |  | DroidClaw |   |        | (Anthropic,      |
    |  | Agent     |   |        |  OpenAI, Gemini) |
    |  | Loop      |   |        +------------------+
    |  +-----+-----+   |
    |        |          |
    |  +-----v-----+   |        +------------------+
    |  |  25 Tools  |   | <----> |  External APIs   |
    |  | GPS, Web,  |   |        | (Brave, ORS,     |
    |  | Calendar,  |   |        |  Nominatim, SNCF,|
    |  | Files ...  |   |        |  Telegram)       |
    |  +-----------+   |        +------------------+
    |                  |
    |  Local storage:  |
    |  Sessions,       |
    |  Memory, Config  |
    +------------------+
         Your data stays here.

Privacy-first by design. Sovereign by architecture.

What is DroidClaw?

DroidClaw is a personal AI assistant that runs entirely on an Android phone, with no external server.

Agent-based: agentic LLM loop + iterative tool calling
Multi-provider: Anthropic (Claude), OpenRouter, OpenAI, Groq, Google Gemini
Dual interface: built-in Flutter chat + Telegram bot
Multilingual: English, French, Spanish, German, Italian — switchable from the chat screen (locale switcher in the AppBar)
Knowledge Graph: persistent memory across conversations — entities, facts, relations, hybrid search (BM25 + vector + graph activation + decay)
Multi-provider embeddings: Gemini, OpenAI, OpenRouter — vector similarity search for semantic recall
On-device only: everything runs on the phone — LLM API calls, tool execution, session management

Origin — From PicoClaw to DroidClaw

DroidClaw is a port of PicoClaw, a Go-based AI assistant (~16K lines) designed to run as a CLI/gateway on lightweight Linux hardware.

What Was Kept

Agent Loop: the agentic loop (LLM -> tool calls -> iteration)
LLM Providers: multi-provider abstraction (Anthropic, OpenAI, OpenRouter, Groq, Gemini)
Tools: web_search (Brave), web_scrape (HTTP+Markdown), web_scrape_js (WebView), file (sandboxed), get_location (GPS), get_address (reverse geocoding), geocode (address to GPS via Nominatim), subagent, message, clipboard, device_info, speak (TTS), open_app (URL/intent launcher), set_alarm, notifications (local notifications/reminders), contacts (read-only), calendar (read/write), ocr (on-device text extraction), qr_generate (QR code images), pick_image (gallery/camera), volume_control (audio levels), get_directions (ORS routing), get_transit (SNCF + IDFM public transit), weather (Open-Meteo/Météo-France)
Sessions: conversation history with Hive persistence
Memory: long-term MEMORY.md + daily notes
Skills: three-tier loading (builtin -> global -> workspace)
Summarization: automatic summarization of long conversations

What Was Removed

Shell/exec tools (no shell execution on Android)
I2C/SPI/USB monitoring (Linux hardware only)
HTTP health server (no server on mobile)
Gateway/CLI (replaced by Flutter UI)

What Was Added

Flutter chat UI: main interface with Markdown rendering, real-time tool indicators, conversation history
Telegram bot via Android foreground service: a DroidClaw innovation. PicoClaw had a server-side Telegram channel (webhook). DroidClaw runs polling directly on the Android phone via a foreground service with long polling, with no external server whatsoever. This is a fundamental architecture shift.
Scheduled Prompts (Cron): define recurring prompts that execute automatically (fixed interval or specific times of day, with day-of-week filtering). Each cron can use a fresh session or continue in the same thread. Managed via Settings > Scheduled Prompts.
Autonomous cron execution: the foreground service isolate initializes its own AgentLoop (ServiceAgentFactory) and executes crons at exact scheduled time — even when Android kills the main app overnight. Falls back to a pending trigger queue if the service AgentLoop isn't available.
Reverse Geocoding: get_address tool chains with get_location to resolve GPS coordinates into a street address (Nominatim/OpenStreetMap, no API key needed).
Knowledge Graph: persistent memory across conversations using a local SQLite database with FTS5 full-text search, entity resolution (Jaro-Winkler fuzzy matching), bi-temporal fact versioning, spreading activation over the graph, and Ebbinghaus memory decay. Automatic extraction of entities, relations, and facts from each conversation turn via LLM. Two tools: knowledge_search (hybrid retrieval) and knowledge_store (explicit persistence).
Multi-provider embeddings: pluggable embedding API layer supporting Gemini (native REST), OpenAI, and OpenRouter. Entity embeddings are computed during KG ingestion and used for vector similarity search in retrieval. The HybridScorer fuses 4 signals: BM25 (lexical), vector cosine similarity (semantic), spreading activation (graph structure), and memory decay (recency). Degrades gracefully when no embeddings are configured.
Radio France streaming: radio tool plays live Radio France HLS streams (France Inter, France Info, France Culture, France Musique, FIP) via native Android Media3 MediaSessionService with background playback and media notification.
Native speech-to-text: on-device voice input via Android SpeechRecognizer (replaced cloud-based Groq Whisper). Supports dictation mode with partial results.

Architecture

Overview

graph TB
    subgraph "Android App"
        subgraph "Main Isolate"
            UI["Flutter Chat UI"]
            TM["TelegramBotManager"]
            BG["BackgroundServiceNotifier"]
            AL["AgentLoop"]
            CB["ContextBuilder"]
            SM["SessionManager"]
            TR["ToolRegistry"]
            LP["LLMProvider"]
            RP["Riverpod Providers"]
        end
        subgraph "Service Isolate (Foreground Service)"
            BTH["BackgroundTaskHandler"]
            TA["TelegramApi"]
            SAL["Service AgentLoop\n(autonomous cron)"]
        end
    end

    User1["User (app)"] --> UI
    User2["User (Telegram)"] --> TG["Telegram API"]
    TG --> BTH
    BTH <-->|"port comm"| BG
    BTH <-->|"port comm"| TM
    BTH -->|"cron trigger"| SAL
    UI --> AL
    TM --> AL
    AL --> LP
    AL --> TR
    AL --> SM
    AL --> CB
    SAL --> LLM
    LP --> LLM["LLM APIs (Anthropic, OpenRouter, ...)"]
    AL --> KG["KnowledgeService\n(hybrid search +\nembedding ingestion)"]
    KG --> KGDB["SQLite KG DB\n(FTS5 + embedding BLOBs)"]
    KG --> EP["EmbeddingProvider\n(Gemini / OpenAI)"]
    TR --> Tools["28 tools: web_search / web_scrape / file / get_location / knowledge_search / knowledge_store / get_directions / get_transit / weather / radio / ..."]

Agent Loop

sequenceDiagram
    participant U as User
    participant AL as AgentLoop
    participant LLM as LLM Provider
    participant T as Tools
    participant S as Session

    U->>AL: message
    AL->>S: add user message
    loop max N iterations
        AL->>LLM: chat(messages, tools)
        LLM-->>AL: response
        alt no tool calls
            AL->>S: add assistant response
            AL-->>U: final response
        else has tool calls
            AL->>S: add assistant + tool_calls
            AL->>T: execute(tool_name, args)
            T-->>AL: ToolResult (forLLM / forUser)
            AL->>S: add tool result
        end
    end

Dual-Isolate Architecture

graph LR
    subgraph "Service Isolate (Foreground Service)"
        GP["getUpdates\n(long poll 30s)"]
        SM2["sendMessage"]
        CR["Cron Scheduler"]
        SAL2["Service AgentLoop"]
    end

    subgraph "Main Isolate"
        BGN["BackgroundServiceNotifier"]
        BM["TelegramBotManager\nper-chat queues\nmax 3 concurrent"]
        AL2["AgentLoop"]
    end

    TG2["Telegram Server"] <-->|"HTTPS"| GP
    TG2 <-->|"HTTPS"| SM2
    GP -->|"sendDataToMain"| BM
    BM -->|"processMessage"| AL2
    AL2 -->|"response"| BM
    BM -->|"sendDataToTask"| SM2
    CR -->|"autonomous"| SAL2
    CR -->|"fallback\nsendDataToMain"| BGN
    BGN -->|"processMessage"| AL2

Project Structure

lib/
├── main.dart                    # Entry point, init Hive + SharedPrefs
├── app.dart                     # MaterialApp, routing, Material 3 theme
│
├── core/                        # Business logic (no Flutter UI imports)
│   ├── agent/                   # Agent loop, context builder, memory, ServiceAgentFactory
│   ├── config/                  # AppConfig, ConfigStorage, CronConfig
│   ├── knowledge/               # Knowledge Graph (DB, ingestion, hybrid search, algorithms)
│   ├── providers/               # LLM + Embedding abstraction (Anthropic, HTTP, Gemini, factory)
│   ├── services/                # BackgroundTaskHandler (foreground service isolate)
│   ├── session/                 # Conversation persistence (Hive)
│   ├── skills/                  # Three-tier loader and installer
│   └── tools/                   # Tool interface + 28 implementations
│
├── features/                    # Screens and platform features
│   ├── chat/                    # Main screen, message bubbles, history
│   ├── onboarding/              # First-launch setup
│   ├── settings/                # Provider, tools, skills, cron, Telegram, embedding, knowledge
│   ├── telegram/                # Bot API, bot manager, rate limiter
│   └── voice/                   # Voice input (native speech-to-text)
│
├── l10n/                        # i18n: ARB files (EN/FR/ES/DE/IT), generated code, tr() helper
├── providers/                   # Riverpod: app, chat, background service, Telegram
├── data/local/                  # Unified StorageService
└── shared/                      # Constants

111 Dart files in total.

Dual Interface — Chat + Telegram

Why Two Interfaces?

Flutter chat: direct on-device interaction with Markdown rendering, real-time tool indicators, and session history
Telegram bot: remote access from any device (PC, tablet, another phone), even when the Android phone is in a pocket or turned off. The user sends a message on Telegram, the phone processes it in the background and replies.

Both interfaces use the same AgentLoop. Telegram uses separate session keys (telegram_<chat_id>) so conversations don't mix. Other users (family, team) can also talk to the bot if the whitelist allows it.

Why Telegram and Not WhatsApp?

Three concrete reasons:

No public API: WhatsApp Business API requires Meta verification, a hosted server, and webhook endpoints (HTTPS with a public IP). An Android phone behind NAT/4G cannot receive webhooks.
Long polling doesn't exist: WhatsApp has no equivalent to Telegram's getUpdates. It's webhook-only.
Complexity vs. value: WhatsApp Business Cloud API requires OAuth registration, webhook validation, message templates, and a server to receive callbacks. This negates the principle of a 100% on-device app.

Telegram won because: simple HTTP long polling (works behind any NAT), no server needed, open Bot API, free, and widely used.

Android Constraints and Technical Choices

No Server on Android

Android cannot reliably host an HTTP server:

No fixed public IP (NAT, cellular networks, dynamic IPs)
Android aggressively kills background processes
Even foreground services have restrictions (Android 12+ limits, dataSync 6h cap on Android 15)

Solution: long polling (client-initiated HTTP requests) instead of webhooks (server-side). The phone asks Telegram "any new messages?" every 30 seconds — no inbound port, no server, works behind any NAT.

Webhook model (impossible):       Long polling model (DroidClaw):
Telegram -> phone:8443            Phone -> Telegram API
(blocked by NAT/firewall)         (works from anywhere)

Foreground Service: `remoteMessaging|location` Not `dataSync`

dataSync has a 6-hour execution limit per 24h on Android 15+
remoteMessaging|location has no time limit — remoteMessaging for Telegram polling, location for GPS access from background crons
The foreground service displays a persistent notification ("DroidClaw Bot - Active")
The service survives backgrounding and app kill

Key Technical Patterns

Dual ToolResult (Pattern from the Go Codebase)

class ToolResult {
  final String forLLM;   // Context for the model (complete data)
  final String forUser;  // UI display (formatted, truncated)
}

The LLM receives the raw data it needs to reason. The user sees a clean, formatted version.

Automatic Summarization

Triggered when: 20+ messages OR estimated tokens > 75% of maxTokens. Keeps the last 4 messages intact, summarizes the rest via an LLM call, prepends as system context. Prevents context window overflow in long conversations.

Sealed Event Stream

sealed class AgentEvent {}
class ThinkingEvent extends AgentEvent { ... }
class ToolCallEvent extends AgentEvent { ... }
class ToolResultEvent extends AgentEvent { ... }
class ResponseEvent extends AgentEvent { ... }

Both interfaces (chat UI and Telegram) consume the same Stream<AgentEvent>. The chat UI renders each event in real time. Telegram only sends the final ResponseEvent.

Internationalization (i18n)

DroidClaw is fully localized in 5 languages — English, French, Spanish, German, and Italian (~380 ARB keys per locale). The user switches language from a flag icon in the chat AppBar — the change propagates instantly to the entire app, tool outputs, notifications, and the agent's response language.

Two access patterns coexist:

AppLocalizations.of(context) — standard Flutter, used in all UI screens (lib/features/)
tr(languageCode) — context-free pure Dart function, used in lib/core/ and the service isolate where no BuildContext exists

The service isolate (foreground service) receives the locale via SharedPreferences cache. Tools that produce user-facing output (weather, transit, date/time, device info) receive the locale via constructor injection and localize their forUser result while keeping forLLM in English for consistent LLM reasoning.

Scheduled Prompts (Cron)

class CronDefinition {
  final String name;
  final String prompt;
  final CronSchedule schedule;      // interval or timeOfDay
  final SessionStrategy sessionStrategy; // newEach or sameThread
}

Users define recurring prompts from Settings > Scheduled Prompts. Each cron runs on its configured schedule (fixed interval with min 15 minutes, or specific times of day with optional day-of-week filtering). The agent processes the prompt like a normal user message.

Autonomous execution: the service isolate initializes its own AgentLoop via ServiceAgentFactory — no main app needed. API keys are cached in SharedPreferences (read from FlutterSecureStorage on main isolate, since service isolate can't use FlutterSecureStorage). If the service AgentLoop init fails, crons fall back to a persistent pending queue that replays when the app is opened.

Tool availability depends on execution context:

Tool	App running (main isolate)	Cron autonomous (service isolate)
`web_search`	Yes	Yes
`web_scrape`	Yes	Yes
`web_scrape_js`	Yes	No — requires WebView (Flutter Activity)
`file`	Yes	Yes
`get_location`	Yes	Yes — permission must be pre-granted from app
`get_address`	Yes	Yes — pure HTTP (Nominatim)
`geocode`	Yes	Yes — pure HTTP (Nominatim)
`subagent`	Yes	No — complex lifecycle
`message`	Yes	No — no UI in service isolate
`clipboard`	Yes	No — read requires foreground (Android 10+)
`get_datetime`	Yes	Yes
`device_info`	Yes	Yes
`speak`	Yes	No — audio focus, no user context
`open_app`	Yes	No — launches Activity, jarring from background
`set_alarm`	Yes	No — opens Clock app, jarring from background
`notifications`	Yes	No — initialization requires Activity context
`contacts`	Yes	No — ContentProvider unreliable from background
`calendar`	Yes	No — ContentProvider unreliable from background
`ocr`	Yes	Yes — ML Kit via platform channels
`qr_generate`	Yes	Yes — dart:ui rendering on FlutterEngine
`pick_image`	Yes	No — image picker UI needs Activity
`volume_control`	Yes	No — MethodChannel on Activity engine only
`get_directions`	Yes	Yes — pure HTTP (OpenRouteService API)
`get_transit`	Yes	Yes — pure HTTP (SNCF + PRIM APIs)
`weather`	Yes	Yes — pure HTTP (Open-Meteo)
`knowledge_search`	Yes	Yes — SQLite + optional HTTP (embedding API)
`knowledge_store`	Yes	Yes — SQLite
`radio`	Yes	No — MediaSessionService requires Activity FlutterEngine

The service isolate runs on a separate FlutterEngine with platform channel access (via GeneratedPluginRegistrant). It can use web_search, web_scrape, file, get_location, get_address, geocode, device_info, ocr, qr_generate, get_directions, get_transit, and weather. WebView-based tools, UI-dependent tools, permission-requiring tools (contacts, calendar, notifications), and tools with real-world side effects (TTS, app launches, alarms) are excluded. get_location requires that the user has granted location permission from the app at least once. When Android kills the app overnight and a cron triggers at 3 AM, the service isolate executes it autonomously. If the service AgentLoop init fails, crons fall back to a persistent pending queue that replays when the app is opened.

Tools

The agent has access to 28 tools. The LLM decides autonomously when to call each tool based on the conversation context. Each tool returns a ToolResult.dual() — full data for the LLM, clean summary for the user.

Users can enable or disable individual tools from Settings > Tools > Manage Tools.

Tool	Name	Description
Web Search	`web_search`	Searches the web via the Brave Search API. Returns titles, URLs, and snippets. Requires a Brave API key configured in settings.
Web Scrape	`web_scrape`	Lightweight HTTP scraper. Fetches a page via HTTP GET, parses the HTML DOM with `package:html`, converts to structured Markdown via `html2md` (preserves headings, links, lists). Max 15K chars. Fast, low resources. If the result is empty, the page likely requires JavaScript.
Web Scrape (JS)	`web_scrape_js`	Heavy WebView scraper. Loads the page in a headless `flutter_inappwebview` that executes JavaScript, waits for rendering, then extracts the DOM and converts to Markdown. For SPAs, React/Vue apps, and dynamic sites. Images disabled, 30s timeout, WebView disposed after use.
File	`file`	Sandboxed file operations within the app workspace: `read_file`, `write_file`, `list_dir`. Path validation prevents directory traversal outside the sandbox.
GPS Location	`get_location`	Returns the device's current GPS coordinates (latitude, longitude, accuracy, altitude). Uses Android's `FusedLocationProviderClient` via the `geolocator` package, with automatic fallback from GPS to network location. Handles permission requests and service availability checks.
Reverse Geocoding	`get_address`	Converts GPS coordinates (latitude, longitude) into a human-readable street address using the Nominatim (OpenStreetMap) reverse geocoding API. Free, no API key required. The LLM chains this with `get_location`: first get GPS coords, then resolve to an address.
Geocode	`geocode`	Converts a text address or place name into GPS coordinates (latitude, longitude) using the Nominatim (OpenStreetMap) Search API. Returns up to N matching results with relevance scores. Supports optional country code filtering. The LLM chains this with `get_directions` or `get_transit`: first geocode the address, then route to the destination. Free, no API key required.
Sub-agent	`subagent`	Spawns a sub-task with a fresh session. The main agent delegates a focused task to a sub-agent, which processes it independently and returns the result. The sub-agent session is cleaned up after completion.
Message	`message`	Internal tool for sending messages directly to the user interface. Always enabled (not toggleable). Returns a silent result — the LLM sees no output, but the user sees the message.
Clipboard	`clipboard`	Read or write the device clipboard. The agent reads clipboard content when the user asks, or writes formatted text for the user to paste elsewhere.
Date & Time	`get_datetime`	Returns the current date, time, day of week, timezone, and Unix timestamp from the device. Pure Dart — no API key, no permissions. Useful for time-aware prompts, scheduling context, and cron debugging.
Device Info	`device_info`	Returns battery level and charging status, network connectivity type (WiFi/cellular), device manufacturer, model, and Android version. Useful for context-aware responses.
Text to Speech	`speak`	Speaks text aloud using the device's built-in TTS engine. Supports language selection. Fire-and-forget: the agent continues while audio plays. Max 5000 chars. Disabled by default.
Open App / URL	`open_app`	Opens URLs and apps on the device: web pages (`https:`), phone dialer (`tel:`), email (`mailto:`), SMS (`sms:`), maps (`geo:`). Uses `url_launcher` with scheme allowlist for safety. Disabled by default.
Alarm / Timer	`set_alarm`	Sets alarms or timers via the system Clock app using Android intents (`SET_ALARM`, `SET_TIMER`). The Clock app opens for user confirmation. Disabled by default.
Notifications	`notifications`	Create instant or scheduled local notifications. Operations: `show` (instant), `schedule` (at a future time with timezone-aware scheduling), `cancel` (by id), `list` (pending). Uses separate notification channels for instant vs scheduled. Disabled by default.
Contacts	`contacts`	Read-only access to device contacts. Search by name, phone number, or email (client-side filtering). Returns minimal data (name + phones + emails) to protect privacy. Requires READ_CONTACTS permission (requested at first use). Disabled by default.
Calendar	`calendar`	Read and create calendar events. Operations: `list_calendars` (find calendar IDs), `get_events` (date range query), `create_event` (with title, location, description). Requires READ_CALENDAR + WRITE_CALENDAR permissions. Disabled by default.
OCR	`ocr`	Extract text from images using on-device Google ML Kit text recognition (Latin script). The image must already exist in the workspace (use `pick_image` or `file` tool first). Returns structured text with block count.
QR Code	`qr_generate`	Generate QR code PNG images from text, URLs, WiFi configs, or contact info. Saves a 512x512 PNG to the workspace. Max 4296 characters.
Image Picker	`pick_image`	Open the system image picker to select a photo from the gallery or take a new photo with the camera. The image is copied to the workspace `images/` directory for further processing (e.g. OCR). Requires CAMERA permission for camera source. Disabled by default.
Volume Control	`volume_control`	Read and adjust device volume levels for alarm, media, ringtone, and notification streams. Reports ringer mode (normal/vibrate/silent). Use before `set_alarm` to verify alarm volume is audible. Supports human-readable levels (mute/low/medium/high/max). First custom MethodChannel to Android AudioManager.
Directions	`get_directions`	Route calculation between two GPS coordinates via OpenRouteService API v2. Supports car, bike, road bike, mountain bike, walk, hike, and wheelchair profiles. Returns distance, duration, elevation gain/loss, and turn-by-turn instructions. Also supports isochrone calculation (reachable area within a time budget). Requires a free ORS API key.
Public Transit	`get_transit`	Find public transit routes in France. Auto-routes between two APIs: PRIM/IDFM for Ile-de-France (Metro, RER, Bus, Tram, Transilien) and SNCF for national trains (TGV, TER, Intercites). Returns top 3 journey options with departure/arrival times, transfers, CO2 emissions, and section-by-section itinerary. Supports departure/arrival time constraints and wheelchair-accessible routes. Both APIs use Navitia technology with shared response parsing.
Weather	`weather`	Weather forecast using Open-Meteo API with Météo-France high-precision models (AROME 1.3km + ARPEGE). Returns daily summary (min/max temperature, precipitation, wind, conditions) and hourly breakdown by period (morning/afternoon/evening). 1-7 day forecast. WMO weather codes interpreted to localized descriptions (EN/FR/ES/DE/IT). No API key required.
Knowledge Search	`knowledge_search`	Search the persistent Knowledge Graph for remembered information — people, places, events, concepts from past conversations. Performs hybrid search (FTS5 text matching + vector cosine similarity + graph activation + Ebbinghaus memory decay) and returns ranked entities with their facts and relations.
Knowledge Store	`knowledge_store`	Explicitly store a fact in the Knowledge Graph. Used when the user asks to remember something ("remember that my dentist is Dr. Martin"). Persists entity-key-value triplets with fuzzy entity resolution (Jaro-Winkler matching to existing entities).
Radio	`radio`	Play live Radio France HLS streams in the background. Stations: France Inter, France Info, France Culture, France Musique, FIP. Operations: play, stop, pause, resume, status. Uses native Android Media3 MediaSessionService with background playback notification. Disabled by default.

Dual Scraping Strategy

The LLM is guided by the tool descriptions to use a two-step approach:

Try web_scrape first — fast, lightweight, works for most static sites
Fall back to web_scrape_js — only when web_scrape returns empty (JS-rendered SPA)

Both tools share a common htmlToMarkdown() utility that strips noise elements (<nav>, <footer>, <aside>, <script>, <style>) and produces clean Markdown with ATX headings and fenced code blocks.

Knowledge Graph — Persistent Memory

DroidClaw maintains a persistent Knowledge Graph (KG) — a local SQLite database that remembers information across conversations. Unlike simple chat history, the KG structures knowledge as interconnected entities, facts, and relations, enabling the assistant to recall and reason over past information.

Why a Knowledge Graph?

Chat history is linear and ephemeral — it gets summarized and truncated. The Knowledge Graph solves this by extracting structured knowledge from every conversation and storing it permanently:

"My dentist is Dr. Martin" → Entity Dr. Martin (PERSON), fact profession: dentist, relation User SEES Dr. Martin
"I live at 9 rue de la Paix" → Entity User (PERSON), fact address: 9 rue de la Paix
"My meeting with Alice is Tuesday at 3pm" → Entity Alice (PERSON), relation User MEETS Alice, fact meeting: Tuesday 3pm

Later, when the user asks "Who is my dentist?" or "Where do I live?", the KG retrieves the relevant entities and injects them into the system prompt — even if the original conversation was weeks ago and long since summarized.

Data Model

Entity (node)                 Relation (edge)              Fact (attribute)
┌──────────────┐              ┌──────────────┐             ┌──────────────┐
│ id           │              │ source_id ───┼──→ Entity   │ entity_id ───┼──→ Entity
│ name         │              │ target_id ───┼──→ Entity   │ key          │
│ type         │              │ predicate    │             │ value        │
│ summary      │              │ weight       │             │ value_type   │
│ embedding    │ (BLOB)       │ embedding    │ (BLOB)      │ valid_at     │
│ temperature  │              │ confidence   │             │ expired_at   │
│ access_count │              └──────────────┘             └──────────────┘
│ last_accessed│
└──────────────┘

Entity types: PERSON, PLACE, ORG, EVENT, CONCEPT, DATE.

Temperature (Ebbinghaus memory decay): entities are classified as hot (recently accessed), warm (moderate), cool (aging), or cold (candidates for deactivation). The decay follows an exponential retention curve: R = e^(-t/S) where stability increases with access frequency.

Bi-temporal facts: when a fact changes (e.g., the user moves to a new address), the old value is expired (expired_at timestamp) and the new one inserted. The KG preserves full history — it knows both the current and previous values.

Ingestion Pipeline

After each conversation turn, the KG extraction runs asynchronously (fire-and-forget, never blocks the chat):

User message + Assistant response
    │
    ▼
┌─────────────────────────────┐
│ 1. LLM Entity Extraction    │  ← Structured JSON extraction prompt
│    (entities, relations,     │     (temperature: 0.1, max_tokens: 2048)
│     facts)                   │
└──────────────┬──────────────┘
               ▼
┌─────────────────────────────┐
│ 2. Entity Resolution         │  ← Jaro-Winkler fuzzy matching (0.88 threshold)
│    (deduplicate against      │     + FTS5 candidate search + alias table
│     existing entities)       │
└──────────────┬──────────────┘
               ▼
┌─────────────────────────────┐
│ 3. Relation + Fact Storage   │  ← Bi-temporal upsert (expire old, insert new)
│    (inside DB transaction)   │     Unique triplet constraint (source, pred, target)
└──────────────┬──────────────┘
               ▼
┌─────────────────────────────┐
│ 4. Embedding Computation     │  ← Batch API call to configured provider
│    (outside transaction)     │     Text: "Name (TYPE): summary"
│                              │     Stored as Float32 BLOB (~3 KB per entity)
└─────────────────────────────┘

Step 4 is optional — if no embedding provider is configured, entities are stored without vectors and the retrieval pipeline uses degraded scoring (text search only).

Retrieval Pipeline

Before each LLM call, the KG is queried for relevant context. The pipeline fuses 4 independent signals:

User query: "Where do I live?"
    │
    ▼
┌──────────────────────────────┐
│ 1. Query Expansion (LLM)     │  "Where do I live?"
│    max_tokens: 50, temp: 0   │  → "address home residence habite domicile lieu"
└──────────────┬───────────────┘
               ▼
┌──────────────────────────────┐
│ 2a. FTS5 BM25 Search         │  Text matching on entity names, summaries, facts
│     (entities + facts)        │  → Signal 1: lexical relevance
├──────────────────────────────┤
│ 2b. Vector Similarity Search  │  Embed query → cosine similarity with all entity
│     (cosine, threshold > 0.5) │  embeddings → Signal 2: semantic relevance
└──────────────┬───────────────┘
               ▼
┌──────────────────────────────┐
│ 3. Graph Neighbor Loading     │  Load 2-hop neighbors from candidate entities
│    (2-hop subgraph)           │  → Expands candidate pool via graph structure
└──────────────┬───────────────┘
               ▼
┌──────────────────────────────┐
│ 4. Spreading Activation       │  BFS propagation from seed entities across
│    (decay 0.85, 4 hops)      │  weighted edges → Signal 3: graph centrality
└──────────────┬───────────────┘
               ▼
┌──────────────────────────────┐
│ 5. Memory Decay (Ebbinghaus)  │  Retention score based on last access time +
│    R = e^(-t/S)               │  access frequency → Signal 4: recency/importance
└──────────────┬───────────────┘
               ▼
┌──────────────────────────────┐
│ 6. HybridScorer Fusion        │  Normalizes + weighted sum of all 4 signals
│                                │
│  Full mode (with embeddings):  │  0.30 BM25 + 0.30 vector + 0.25 activation + 0.15 decay
│  Degraded (no embeddings):     │  0.55 BM25 + 0.30 activation + 0.15 decay
└──────────────┬───────────────┘
               ▼
┌──────────────────────────────┐
│ 7. Top-K → System Prompt      │  Entities + facts + relations formatted as XML
│    (reinforcement: touch)      │  and injected into the LLM context
└──────────────────────────────┘

This hybrid approach means the KG can find relevant information even when the user's query uses different vocabulary than the stored data. For example, "Where do I live?" matches the fact address: 9 rue de la Paix through three complementary paths:

Query expansion generates keywords like "address", "home", "domicile"
Vector similarity bridges the semantic gap between "live" and "address"
Graph activation boosts the User entity and its connected facts

Embedding Providers

Vector similarity search requires an embedding provider. DroidClaw supports three providers, all via remote API calls:

Provider	Default Model	Free Tier	Dimensions
Gemini (recommended)	`gemini-embedding-001`	Generous free tier	768 (up to 3072)
OpenAI	`text-embedding-3-small`	No	768 (up to 1536)
OpenRouter	`openai/text-embedding-3-small`	No	768 (up to 1536)

By default, the embedding provider reuses the LLM provider's API key. A separate key can be configured if needed (e.g., using Anthropic for chat + Gemini for embeddings).

Configure in Settings > Embedding. Enable the Knowledge Graph in Settings > Knowledge.

Tool Registration Flow

AppConfig.tools.disabledTools (persisted in SharedPreferences)
    ↓
toolRegistryProvider (rebuilds on config change)
    ↓ only registers enabled tools
ToolRegistry
    ↓
ContextBuilder (system prompt lists available tools)
AgentLoop (sends tool definitions to LLM, executes tool calls)

Getting Started

Prerequisites

Flutter 3.38+
Android SDK (API 24+)

Build

flutter pub get
flutter analyze
flutter build apk --release --split-per-abi

Install

adb install build/app/outputs/flutter-apk/app-arm64-v8a-release.apk

First Launch

Onboarding: choose your LLM provider (OpenRouter, Anthropic, OpenAI, Groq, Gemini)
Enter the API key
Test the connection
Start chatting

API Keys

DroidClaw requires API keys for the LLM provider and for some tools. All keys are stored securely on the device (FlutterSecureStorage). No key is ever sent to a third-party server.

Required (LLM Provider)

You need one LLM provider key to use DroidClaw:

Provider	Free Tier	Guide
OpenRouter (recommended)	Free models available, pay-as-you-go	Get key
Anthropic (Claude)	$5 trial credit	Get key
OpenAI (GPT)	$5 trial credit	Get key
Groq (Llama, Mixtral)	Generous free tier	Get key
Google Gemini	15 req/min free	Get key

Optional (Tools)

These keys unlock specific tools. The agent works without them, but the corresponding tools will be unavailable.

Service	Tool	Free Tier	Guide
Brave Search	`web_search`	2,000 queries/month	Get key
OpenRouteService	`get_directions`	2,000 req/day	Get key
SNCF	`get_transit` (national trains)	5,000 req/day	Get key
PRIM / IDFM	`get_transit` (Ile-de-France)	1,000 req/day	Get key

Optional (Embeddings)

The Knowledge Graph's vector search uses an embedding API. By default it reuses the LLM provider's API key — a separate key is only needed if you use a different embedding provider.

Provider	Default Model	Free Tier	Dimensions
Gemini (recommended)	`gemini-embedding-001`	Generous free tier	768 (up to 3072)
OpenAI	`text-embedding-3-small`	No	768 (up to 1536)
OpenRouter	`openai/text-embedding-3-small`	No	768 (up to 1536)

Configure in Settings > Embedding. If no embedding provider is configured, the Knowledge Graph falls back to degraded mode (BM25 text search only — still functional, just less semantically precise).

Optional (Channels)

Service	Purpose	Guide
Telegram Bot	Remote access via Telegram	Get token

No Key Required

These tools work out of the box, no configuration needed: web_scrape, web_scrape_js, file, get_location, get_address, subagent, message, clipboard, geocode, get_datetime, device_info, speak, open_app, set_alarm, notifications, contacts, calendar, ocr, qr_generate, pick_image, volume_control, weather, knowledge_search, knowledge_store, radio.

Stats


Dart files	111
Analysis issues	0
APK size (arm64)	39.5 MB
Languages	English, French, Spanish, German, Italian
Native code	Kotlin (AudioChannelPlugin — volume control, RadioPlaybackService — Media3 streaming)
minSdkVersion	24 (Android 7.0)
targetSdkVersion	34 (Android 14)

Why ARaccoon?

DroidClaw is pronounced "ARaccoon" — The Raccoon. This is the name shown in the Android launcher and the project's mascot.

The Raccoon as a Metaphor for the AI Agent

The claws: raccoons are famous for their extremely dexterous front paws, capable of manipulating objects, picking locks, and rummaging everywhere. They are the perfect embodiment of iterative tool calling — the agent that calls web_search, parses the results, follows up with web_scrape, extracts the info, and loops until it finds the answer.
Intelligence and resourcefulness: clever, adaptable, they always find a solution. Exactly what an AI agent does when it loops, fails, adjusts its strategy, and eventually solves the problem.
The nocturnal and discreet side: active at night, a bit "bandit-like" (in the cute sense). This evokes the off-grid nature of the application — everything runs locally on the phone, no data is sent to a central server, total privacy.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
android		android
assets		assets
docs		docs
lib		lib
test		test
todos		todos
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
analysis_options.yaml		analysis_options.yaml
build.yaml		build.yaml
l10n.yaml		l10n.yaml
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Folders and files

Latest commit

History

Repository files navigation

DroidClaw

Why Native Mobile? The Case for AI in Your Pocket

Ubiquity: The assistant must be where life happens

Native vs. Web: A question of friction

Contextual AI: Seeing what you see

UX: Habits and comfort

In summary

The AI Sovereignty Problem

Why DroidClaw Exists

How DroidClaw Works

What is DroidClaw?

Origin — From PicoClaw to DroidClaw

What Was Kept

What Was Removed

What Was Added

Architecture

Overview

Agent Loop

Dual-Isolate Architecture

Project Structure

Dual Interface — Chat + Telegram

Why Two Interfaces?

Why Telegram and Not WhatsApp?

Android Constraints and Technical Choices

No Server on Android

Foreground Service: remoteMessaging|location Not dataSync

Key Technical Patterns

Dual ToolResult (Pattern from the Go Codebase)

Automatic Summarization

Sealed Event Stream

Internationalization (i18n)

Scheduled Prompts (Cron)

Tools

Dual Scraping Strategy

Knowledge Graph — Persistent Memory

Why a Knowledge Graph?

Data Model

Ingestion Pipeline

Retrieval Pipeline

Embedding Providers

Tool Registration Flow

Getting Started

Prerequisites

Build

Install

First Launch

API Keys

Required (LLM Provider)

Optional (Tools)

Optional (Embeddings)

Optional (Channels)

No Key Required

Stats

Why ARaccoon?

The Raccoon as a Metaphor for the AI Agent

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Foreground Service: `remoteMessaging|location` Not `dataSync`

Packages