Personal AI assistant on Android — agent loop + tool calling + dual Chat & Telegram interface
For an AI to graduate from "gadget you query" to true personal assistant, the transition from desktop to native mobile isn't just logical — it's indispensable. A browser tab on a phone is the gap between a work tool and a life companion.
The desktop is a sedentary workspace. You go there to produce, code, or write. But most needs for assistance arise when you're not in front of a 27-inch screen — in the street, in the kitchen, mid-conversation, or at the grocery store.
- Desktop: a destination you visit.
- Mobile: an extension of yourself. An assistant that stays on a computer is a part-time assistant.
Nobody wants a web interface on their phone. The gap in performance and integration is brutal:
- Responsiveness: a native app uses local device resources. Waiting for a web page to load, dealing with cookies and page refreshes — this kills the instantaneity a voice command requires.
- System integration: a web page is trapped in its tab. A native app interacts with your contacts, calendar, reminders, and — crucially — your sensors (GPS, accelerometer, camera).
- Always-on mode: only native technology enables voice activation ("wake word") or lock-screen access. Nobody will unlock their phone, open a browser, type a URL, and wait for it to load just to say "remind me to buy bread."
The great leap of AI in 2026 is real-time multimodality. For AI to be truly useful, it must be able to use the camera to identify an object in front of you, or read what's displayed on your screen to help you in another app.
A browser is a silo. It is blind to what's happening on your phone. A native app is the operating system of your digital life.
Mobile web ergonomics are often a clumsy adaptation of desktop:
- Touch latency: the web is less fluid than native.
- Biometrics: instant access via fingerprint or FaceID is seamless in native, often painful in a browser.
| Web Assistant (Browser) | Native Assistant (App) | |
|---|---|---|
| Speed | Depends on network and web engine | Instant (local resources) |
| Sensor access | Very limited | Full (GPS, camera, mic) |
| Interaction | Text and click only | Voice, gesture, vision |
| Availability | Must open browser first | Always running in background |
The desktop remains king for complex content creation, but the smartphone is the throne of execution and assistance. An assistant that isn't seamlessly "in your pocket" is just another tool — not a companion.
Today's AI assistants follow a model of digital feudalism: platforms own the "harvest" — your data, your preferences, your conversation history, your memory. You interact with AI through corporate-controlled interfaces where your context becomes a platform asset, not yours.
This creates three fundamental problems:
- Data leakage by design — every prompt, every conversation, every tool result flows to a third-party server. Your AI knows your schedule, your contacts, your location, your habits. And it belongs to someone else.
- Vendor lock-in — switching AI providers means losing your history, your memory, your workflows. The more you use one platform, the harder it becomes to leave.
- No autonomy — you can't run tasks in the background, schedule prompts, or chain tools together. You're limited to what the platform allows, when it allows it.
DroidClaw is built on a different premise: your AI assistant should be sovereign infrastructure that you own and control.
This means three things:
- Hardware autonomy — everything runs on your phone. LLM API calls, tool execution, session management, scheduled tasks. No DroidClaw server. No middleware. Your phone is the server.
- Provider freedom — switch between Anthropic, OpenAI, Gemini, Groq, or OpenRouter at any time. Your memory, sessions, and tools stay intact. Zero vendor lock-in.
- Context ownership — conversation history, long-term memory, and AI context are stored locally as sovereign assets. They belong to you, not to a platform.
DroidClaw is not a chatbot. It is an agentic AI assistant — it doesn't just respond, it acts. It reasons iteratively, calls tools, observes results, and loops until the task is solved.
User prompt
--> LLM reasons (Local LLM or Anthropic / OpenAI / Gemini / Groq)
--> Tool call (GPS, web search, calendar, files, transit...)
--> Result fed back to LLM
--> LLM reasons again, calls another tool if needed
--> Final response to user
The agent has access to 28 tools — from web search and file management to GPS location, public transit routing, weather forecasts, calendar access, OCR, knowledge graph, radio streaming, and more. Each tool produces a dual result: raw data for the AI to reason over, and a clean summary for the user to read.
The app survives Android's aggressive battery management through a dual-isolate architecture: the main app handles the UI, while an autonomous foreground service runs scheduled tasks and Telegram bot polling — even when Android kills the main app overnight.
A Telegram bot turns your phone into a remote AI server accessible from any device — PC, tablet, another phone — with zero external infrastructure. The phone polls Telegram directly via long polling, no webhook, no public IP needed.
+------------------+
| Your Phone |
| | +------------------+
| +-----------+ | <----> | LLM APIs |
| | DroidClaw | | | (Anthropic, |
| | Agent | | | OpenAI, Gemini) |
| | Loop | | +------------------+
| +-----+-----+ |
| | |
| +-----v-----+ | +------------------+
| | 25 Tools | | <----> | External APIs |
| | GPS, Web, | | | (Brave, ORS, |
| | Calendar, | | | Nominatim, SNCF,|
| | Files ... | | | Telegram) |
| +-----------+ | +------------------+
| |
| Local storage: |
| Sessions, |
| Memory, Config |
+------------------+
Your data stays here.
Privacy-first by design. Sovereign by architecture.
DroidClaw is a personal AI assistant that runs entirely on an Android phone, with no external server.
- Agent-based: agentic LLM loop + iterative tool calling
- Multi-provider: Anthropic (Claude), OpenRouter, OpenAI, Groq, Google Gemini
- Dual interface: built-in Flutter chat + Telegram bot
- Multilingual: English, French, Spanish, German, Italian — switchable from the chat screen (locale switcher in the AppBar)
- Knowledge Graph: persistent memory across conversations — entities, facts, relations, hybrid search (BM25 + vector + graph activation + decay)
- Multi-provider embeddings: Gemini, OpenAI, OpenRouter — vector similarity search for semantic recall
- On-device only: everything runs on the phone — LLM API calls, tool execution, session management
DroidClaw is a port of PicoClaw, a Go-based AI assistant (~16K lines) designed to run as a CLI/gateway on lightweight Linux hardware.
- Agent Loop: the agentic loop (LLM -> tool calls -> iteration)
- LLM Providers: multi-provider abstraction (Anthropic, OpenAI, OpenRouter, Groq, Gemini)
- Tools: web_search (Brave), web_scrape (HTTP+Markdown), web_scrape_js (WebView), file (sandboxed), get_location (GPS), get_address (reverse geocoding), geocode (address to GPS via Nominatim), subagent, message, clipboard, device_info, speak (TTS), open_app (URL/intent launcher), set_alarm, notifications (local notifications/reminders), contacts (read-only), calendar (read/write), ocr (on-device text extraction), qr_generate (QR code images), pick_image (gallery/camera), volume_control (audio levels), get_directions (ORS routing), get_transit (SNCF + IDFM public transit), weather (Open-Meteo/Météo-France)
- Sessions: conversation history with Hive persistence
- Memory: long-term MEMORY.md + daily notes
- Skills: three-tier loading (builtin -> global -> workspace)
- Summarization: automatic summarization of long conversations
- Shell/exec tools (no shell execution on Android)
- I2C/SPI/USB monitoring (Linux hardware only)
- HTTP health server (no server on mobile)
- Gateway/CLI (replaced by Flutter UI)
- Flutter chat UI: main interface with Markdown rendering, real-time tool indicators, conversation history
- Telegram bot via Android foreground service: a DroidClaw innovation. PicoClaw had a server-side Telegram channel (webhook). DroidClaw runs polling directly on the Android phone via a foreground service with long polling, with no external server whatsoever. This is a fundamental architecture shift.
- Scheduled Prompts (Cron): define recurring prompts that execute automatically (fixed interval or specific times of day, with day-of-week filtering). Each cron can use a fresh session or continue in the same thread. Managed via Settings > Scheduled Prompts.
- Autonomous cron execution: the foreground service isolate initializes its own AgentLoop (
ServiceAgentFactory) and executes crons at exact scheduled time — even when Android kills the main app overnight. Falls back to a pending trigger queue if the service AgentLoop isn't available. - Reverse Geocoding:
get_addresstool chains withget_locationto resolve GPS coordinates into a street address (Nominatim/OpenStreetMap, no API key needed). - Knowledge Graph: persistent memory across conversations using a local SQLite database with FTS5 full-text search, entity resolution (Jaro-Winkler fuzzy matching), bi-temporal fact versioning, spreading activation over the graph, and Ebbinghaus memory decay. Automatic extraction of entities, relations, and facts from each conversation turn via LLM. Two tools:
knowledge_search(hybrid retrieval) andknowledge_store(explicit persistence). - Multi-provider embeddings: pluggable embedding API layer supporting Gemini (native REST), OpenAI, and OpenRouter. Entity embeddings are computed during KG ingestion and used for vector similarity search in retrieval. The HybridScorer fuses 4 signals: BM25 (lexical), vector cosine similarity (semantic), spreading activation (graph structure), and memory decay (recency). Degrades gracefully when no embeddings are configured.
- Radio France streaming:
radiotool plays live Radio France HLS streams (France Inter, France Info, France Culture, France Musique, FIP) via native Android Media3 MediaSessionService with background playback and media notification. - Native speech-to-text: on-device voice input via Android SpeechRecognizer (replaced cloud-based Groq Whisper). Supports dictation mode with partial results.
graph TB
subgraph "Android App"
subgraph "Main Isolate"
UI["Flutter Chat UI"]
TM["TelegramBotManager"]
BG["BackgroundServiceNotifier"]
AL["AgentLoop"]
CB["ContextBuilder"]
SM["SessionManager"]
TR["ToolRegistry"]
LP["LLMProvider"]
RP["Riverpod Providers"]
end
subgraph "Service Isolate (Foreground Service)"
BTH["BackgroundTaskHandler"]
TA["TelegramApi"]
SAL["Service AgentLoop\n(autonomous cron)"]
end
end
User1["User (app)"] --> UI
User2["User (Telegram)"] --> TG["Telegram API"]
TG --> BTH
BTH <-->|"port comm"| BG
BTH <-->|"port comm"| TM
BTH -->|"cron trigger"| SAL
UI --> AL
TM --> AL
AL --> LP
AL --> TR
AL --> SM
AL --> CB
SAL --> LLM
LP --> LLM["LLM APIs (Anthropic, OpenRouter, ...)"]
AL --> KG["KnowledgeService\n(hybrid search +\nembedding ingestion)"]
KG --> KGDB["SQLite KG DB\n(FTS5 + embedding BLOBs)"]
KG --> EP["EmbeddingProvider\n(Gemini / OpenAI)"]
TR --> Tools["28 tools: web_search / web_scrape / file / get_location / knowledge_search / knowledge_store / get_directions / get_transit / weather / radio / ..."]
sequenceDiagram
participant U as User
participant AL as AgentLoop
participant LLM as LLM Provider
participant T as Tools
participant S as Session
U->>AL: message
AL->>S: add user message
loop max N iterations
AL->>LLM: chat(messages, tools)
LLM-->>AL: response
alt no tool calls
AL->>S: add assistant response
AL-->>U: final response
else has tool calls
AL->>S: add assistant + tool_calls
AL->>T: execute(tool_name, args)
T-->>AL: ToolResult (forLLM / forUser)
AL->>S: add tool result
end
end
graph LR
subgraph "Service Isolate (Foreground Service)"
GP["getUpdates\n(long poll 30s)"]
SM2["sendMessage"]
CR["Cron Scheduler"]
SAL2["Service AgentLoop"]
end
subgraph "Main Isolate"
BGN["BackgroundServiceNotifier"]
BM["TelegramBotManager\nper-chat queues\nmax 3 concurrent"]
AL2["AgentLoop"]
end
TG2["Telegram Server"] <-->|"HTTPS"| GP
TG2 <-->|"HTTPS"| SM2
GP -->|"sendDataToMain"| BM
BM -->|"processMessage"| AL2
AL2 -->|"response"| BM
BM -->|"sendDataToTask"| SM2
CR -->|"autonomous"| SAL2
CR -->|"fallback\nsendDataToMain"| BGN
BGN -->|"processMessage"| AL2
lib/
├── main.dart # Entry point, init Hive + SharedPrefs
├── app.dart # MaterialApp, routing, Material 3 theme
│
├── core/ # Business logic (no Flutter UI imports)
│ ├── agent/ # Agent loop, context builder, memory, ServiceAgentFactory
│ ├── config/ # AppConfig, ConfigStorage, CronConfig
│ ├── knowledge/ # Knowledge Graph (DB, ingestion, hybrid search, algorithms)
│ ├── providers/ # LLM + Embedding abstraction (Anthropic, HTTP, Gemini, factory)
│ ├── services/ # BackgroundTaskHandler (foreground service isolate)
│ ├── session/ # Conversation persistence (Hive)
│ ├── skills/ # Three-tier loader and installer
│ └── tools/ # Tool interface + 28 implementations
│
├── features/ # Screens and platform features
│ ├── chat/ # Main screen, message bubbles, history
│ ├── onboarding/ # First-launch setup
│ ├── settings/ # Provider, tools, skills, cron, Telegram, embedding, knowledge
│ ├── telegram/ # Bot API, bot manager, rate limiter
│ └── voice/ # Voice input (native speech-to-text)
│
├── l10n/ # i18n: ARB files (EN/FR/ES/DE/IT), generated code, tr() helper
├── providers/ # Riverpod: app, chat, background service, Telegram
├── data/local/ # Unified StorageService
└── shared/ # Constants
111 Dart files in total.
- Flutter chat: direct on-device interaction with Markdown rendering, real-time tool indicators, and session history
- Telegram bot: remote access from any device (PC, tablet, another phone), even when the Android phone is in a pocket or turned off. The user sends a message on Telegram, the phone processes it in the background and replies.
Both interfaces use the same AgentLoop. Telegram uses separate session keys (telegram_<chat_id>) so conversations don't mix. Other users (family, team) can also talk to the bot if the whitelist allows it.
Three concrete reasons:
-
No public API: WhatsApp Business API requires Meta verification, a hosted server, and webhook endpoints (HTTPS with a public IP). An Android phone behind NAT/4G cannot receive webhooks.
-
Long polling doesn't exist: WhatsApp has no equivalent to Telegram's
getUpdates. It's webhook-only. -
Complexity vs. value: WhatsApp Business Cloud API requires OAuth registration, webhook validation, message templates, and a server to receive callbacks. This negates the principle of a 100% on-device app.
Telegram won because: simple HTTP long polling (works behind any NAT), no server needed, open Bot API, free, and widely used.
Android cannot reliably host an HTTP server:
- No fixed public IP (NAT, cellular networks, dynamic IPs)
- Android aggressively kills background processes
- Even foreground services have restrictions (Android 12+ limits,
dataSync6h cap on Android 15)
Solution: long polling (client-initiated HTTP requests) instead of webhooks (server-side). The phone asks Telegram "any new messages?" every 30 seconds — no inbound port, no server, works behind any NAT.
Webhook model (impossible): Long polling model (DroidClaw):
Telegram -> phone:8443 Phone -> Telegram API
(blocked by NAT/firewall) (works from anywhere)
dataSynchas a 6-hour execution limit per 24h on Android 15+remoteMessaging|locationhas no time limit —remoteMessagingfor Telegram polling,locationfor GPS access from background crons- The foreground service displays a persistent notification ("DroidClaw Bot - Active")
- The service survives backgrounding and app kill
class ToolResult {
final String forLLM; // Context for the model (complete data)
final String forUser; // UI display (formatted, truncated)
}The LLM receives the raw data it needs to reason. The user sees a clean, formatted version.
Triggered when: 20+ messages OR estimated tokens > 75% of maxTokens. Keeps the last 4 messages intact, summarizes the rest via an LLM call, prepends as system context. Prevents context window overflow in long conversations.
sealed class AgentEvent {}
class ThinkingEvent extends AgentEvent { ... }
class ToolCallEvent extends AgentEvent { ... }
class ToolResultEvent extends AgentEvent { ... }
class ResponseEvent extends AgentEvent { ... }Both interfaces (chat UI and Telegram) consume the same Stream<AgentEvent>. The chat UI renders each event in real time. Telegram only sends the final ResponseEvent.
DroidClaw is fully localized in 5 languages — English, French, Spanish, German, and Italian (~380 ARB keys per locale). The user switches language from a flag icon in the chat AppBar — the change propagates instantly to the entire app, tool outputs, notifications, and the agent's response language.
Two access patterns coexist:
AppLocalizations.of(context)— standard Flutter, used in all UI screens (lib/features/)tr(languageCode)— context-free pure Dart function, used inlib/core/and the service isolate where noBuildContextexists
The service isolate (foreground service) receives the locale via SharedPreferences cache. Tools that produce user-facing output (weather, transit, date/time, device info) receive the locale via constructor injection and localize their forUser result while keeping forLLM in English for consistent LLM reasoning.
class CronDefinition {
final String name;
final String prompt;
final CronSchedule schedule; // interval or timeOfDay
final SessionStrategy sessionStrategy; // newEach or sameThread
}Users define recurring prompts from Settings > Scheduled Prompts. Each cron runs on its configured schedule (fixed interval with min 15 minutes, or specific times of day with optional day-of-week filtering). The agent processes the prompt like a normal user message.
Autonomous execution: the service isolate initializes its own AgentLoop via ServiceAgentFactory — no main app needed. API keys are cached in SharedPreferences (read from FlutterSecureStorage on main isolate, since service isolate can't use FlutterSecureStorage). If the service AgentLoop init fails, crons fall back to a persistent pending queue that replays when the app is opened.
Tool availability depends on execution context:
| Tool | App running (main isolate) | Cron autonomous (service isolate) |
|---|---|---|
web_search |
Yes | Yes |
web_scrape |
Yes | Yes |
web_scrape_js |
Yes | No — requires WebView (Flutter Activity) |
file |
Yes | Yes |
get_location |
Yes | Yes — permission must be pre-granted from app |
get_address |
Yes | Yes — pure HTTP (Nominatim) |
geocode |
Yes | Yes — pure HTTP (Nominatim) |
subagent |
Yes | No — complex lifecycle |
message |
Yes | No — no UI in service isolate |
clipboard |
Yes | No — read requires foreground (Android 10+) |
get_datetime |
Yes | Yes |
device_info |
Yes | Yes |
speak |
Yes | No — audio focus, no user context |
open_app |
Yes | No — launches Activity, jarring from background |
set_alarm |
Yes | No — opens Clock app, jarring from background |
notifications |
Yes | No — initialization requires Activity context |
contacts |
Yes | No — ContentProvider unreliable from background |
calendar |
Yes | No — ContentProvider unreliable from background |
ocr |
Yes | Yes — ML Kit via platform channels |
qr_generate |
Yes | Yes — dart:ui rendering on FlutterEngine |
pick_image |
Yes | No — image picker UI needs Activity |
volume_control |
Yes | No — MethodChannel on Activity engine only |
get_directions |
Yes | Yes — pure HTTP (OpenRouteService API) |
get_transit |
Yes | Yes — pure HTTP (SNCF + PRIM APIs) |
weather |
Yes | Yes — pure HTTP (Open-Meteo) |
knowledge_search |
Yes | Yes — SQLite + optional HTTP (embedding API) |
knowledge_store |
Yes | Yes — SQLite |
radio |
Yes | No — MediaSessionService requires Activity FlutterEngine |
The service isolate runs on a separate FlutterEngine with platform channel access (via GeneratedPluginRegistrant). It can use web_search, web_scrape, file, get_location, get_address, geocode, device_info, ocr, qr_generate, get_directions, get_transit, and weather. WebView-based tools, UI-dependent tools, permission-requiring tools (contacts, calendar, notifications), and tools with real-world side effects (TTS, app launches, alarms) are excluded. get_location requires that the user has granted location permission from the app at least once. When Android kills the app overnight and a cron triggers at 3 AM, the service isolate executes it autonomously. If the service AgentLoop init fails, crons fall back to a persistent pending queue that replays when the app is opened.
The agent has access to 28 tools. The LLM decides autonomously when to call each tool based on the conversation context. Each tool returns a ToolResult.dual() — full data for the LLM, clean summary for the user.
Users can enable or disable individual tools from Settings > Tools > Manage Tools.
| Tool | Name | Description |
|---|---|---|
| Web Search | web_search |
Searches the web via the Brave Search API. Returns titles, URLs, and snippets. Requires a Brave API key configured in settings. |
| Web Scrape | web_scrape |
Lightweight HTTP scraper. Fetches a page via HTTP GET, parses the HTML DOM with package:html, converts to structured Markdown via html2md (preserves headings, links, lists). Max 15K chars. Fast, low resources. If the result is empty, the page likely requires JavaScript. |
| Web Scrape (JS) | web_scrape_js |
Heavy WebView scraper. Loads the page in a headless flutter_inappwebview that executes JavaScript, waits for rendering, then extracts the DOM and converts to Markdown. For SPAs, React/Vue apps, and dynamic sites. Images disabled, 30s timeout, WebView disposed after use. |
| File | file |
Sandboxed file operations within the app workspace: read_file, write_file, list_dir. Path validation prevents directory traversal outside the sandbox. |
| GPS Location | get_location |
Returns the device's current GPS coordinates (latitude, longitude, accuracy, altitude). Uses Android's FusedLocationProviderClient via the geolocator package, with automatic fallback from GPS to network location. Handles permission requests and service availability checks. |
| Reverse Geocoding | get_address |
Converts GPS coordinates (latitude, longitude) into a human-readable street address using the Nominatim (OpenStreetMap) reverse geocoding API. Free, no API key required. The LLM chains this with get_location: first get GPS coords, then resolve to an address. |
| Geocode | geocode |
Converts a text address or place name into GPS coordinates (latitude, longitude) using the Nominatim (OpenStreetMap) Search API. Returns up to N matching results with relevance scores. Supports optional country code filtering. The LLM chains this with get_directions or get_transit: first geocode the address, then route to the destination. Free, no API key required. |
| Sub-agent | subagent |
Spawns a sub-task with a fresh session. The main agent delegates a focused task to a sub-agent, which processes it independently and returns the result. The sub-agent session is cleaned up after completion. |
| Message | message |
Internal tool for sending messages directly to the user interface. Always enabled (not toggleable). Returns a silent result — the LLM sees no output, but the user sees the message. |
| Clipboard | clipboard |
Read or write the device clipboard. The agent reads clipboard content when the user asks, or writes formatted text for the user to paste elsewhere. |
| Date & Time | get_datetime |
Returns the current date, time, day of week, timezone, and Unix timestamp from the device. Pure Dart — no API key, no permissions. Useful for time-aware prompts, scheduling context, and cron debugging. |
| Device Info | device_info |
Returns battery level and charging status, network connectivity type (WiFi/cellular), device manufacturer, model, and Android version. Useful for context-aware responses. |
| Text to Speech | speak |
Speaks text aloud using the device's built-in TTS engine. Supports language selection. Fire-and-forget: the agent continues while audio plays. Max 5000 chars. Disabled by default. |
| Open App / URL | open_app |
Opens URLs and apps on the device: web pages (https:), phone dialer (tel:), email (mailto:), SMS (sms:), maps (geo:). Uses url_launcher with scheme allowlist for safety. Disabled by default. |
| Alarm / Timer | set_alarm |
Sets alarms or timers via the system Clock app using Android intents (SET_ALARM, SET_TIMER). The Clock app opens for user confirmation. Disabled by default. |
| Notifications | notifications |
Create instant or scheduled local notifications. Operations: show (instant), schedule (at a future time with timezone-aware scheduling), cancel (by id), list (pending). Uses separate notification channels for instant vs scheduled. Disabled by default. |
| Contacts | contacts |
Read-only access to device contacts. Search by name, phone number, or email (client-side filtering). Returns minimal data (name + phones + emails) to protect privacy. Requires READ_CONTACTS permission (requested at first use). Disabled by default. |
| Calendar | calendar |
Read and create calendar events. Operations: list_calendars (find calendar IDs), get_events (date range query), create_event (with title, location, description). Requires READ_CALENDAR + WRITE_CALENDAR permissions. Disabled by default. |
| OCR | ocr |
Extract text from images using on-device Google ML Kit text recognition (Latin script). The image must already exist in the workspace (use pick_image or file tool first). Returns structured text with block count. |
| QR Code | qr_generate |
Generate QR code PNG images from text, URLs, WiFi configs, or contact info. Saves a 512x512 PNG to the workspace. Max 4296 characters. |
| Image Picker | pick_image |
Open the system image picker to select a photo from the gallery or take a new photo with the camera. The image is copied to the workspace images/ directory for further processing (e.g. OCR). Requires CAMERA permission for camera source. Disabled by default. |
| Volume Control | volume_control |
Read and adjust device volume levels for alarm, media, ringtone, and notification streams. Reports ringer mode (normal/vibrate/silent). Use before set_alarm to verify alarm volume is audible. Supports human-readable levels (mute/low/medium/high/max). First custom MethodChannel to Android AudioManager. |
| Directions | get_directions |
Route calculation between two GPS coordinates via OpenRouteService API v2. Supports car, bike, road bike, mountain bike, walk, hike, and wheelchair profiles. Returns distance, duration, elevation gain/loss, and turn-by-turn instructions. Also supports isochrone calculation (reachable area within a time budget). Requires a free ORS API key. |
| Public Transit | get_transit |
Find public transit routes in France. Auto-routes between two APIs: PRIM/IDFM for Ile-de-France (Metro, RER, Bus, Tram, Transilien) and SNCF for national trains (TGV, TER, Intercites). Returns top 3 journey options with departure/arrival times, transfers, CO2 emissions, and section-by-section itinerary. Supports departure/arrival time constraints and wheelchair-accessible routes. Both APIs use Navitia technology with shared response parsing. |
| Weather | weather |
Weather forecast using Open-Meteo API with Météo-France high-precision models (AROME 1.3km + ARPEGE). Returns daily summary (min/max temperature, precipitation, wind, conditions) and hourly breakdown by period (morning/afternoon/evening). 1-7 day forecast. WMO weather codes interpreted to localized descriptions (EN/FR/ES/DE/IT). No API key required. |
| Knowledge Search | knowledge_search |
Search the persistent Knowledge Graph for remembered information — people, places, events, concepts from past conversations. Performs hybrid search (FTS5 text matching + vector cosine similarity + graph activation + Ebbinghaus memory decay) and returns ranked entities with their facts and relations. |
| Knowledge Store | knowledge_store |
Explicitly store a fact in the Knowledge Graph. Used when the user asks to remember something ("remember that my dentist is Dr. Martin"). Persists entity-key-value triplets with fuzzy entity resolution (Jaro-Winkler matching to existing entities). |
| Radio | radio |
Play live Radio France HLS streams in the background. Stations: France Inter, France Info, France Culture, France Musique, FIP. Operations: play, stop, pause, resume, status. Uses native Android Media3 MediaSessionService with background playback notification. Disabled by default. |
The LLM is guided by the tool descriptions to use a two-step approach:
- Try
web_scrapefirst — fast, lightweight, works for most static sites - Fall back to
web_scrape_js— only whenweb_scrapereturns empty (JS-rendered SPA)
Both tools share a common htmlToMarkdown() utility that strips noise elements (<nav>, <footer>, <aside>, <script>, <style>) and produces clean Markdown with ATX headings and fenced code blocks.
DroidClaw maintains a persistent Knowledge Graph (KG) — a local SQLite database that remembers information across conversations. Unlike simple chat history, the KG structures knowledge as interconnected entities, facts, and relations, enabling the assistant to recall and reason over past information.
Chat history is linear and ephemeral — it gets summarized and truncated. The Knowledge Graph solves this by extracting structured knowledge from every conversation and storing it permanently:
- "My dentist is Dr. Martin" → Entity
Dr. Martin(PERSON), factprofession: dentist, relationUser SEES Dr. Martin - "I live at 9 rue de la Paix" → Entity
User(PERSON), factaddress: 9 rue de la Paix - "My meeting with Alice is Tuesday at 3pm" → Entity
Alice(PERSON), relationUser MEETS Alice, factmeeting: Tuesday 3pm
Later, when the user asks "Who is my dentist?" or "Where do I live?", the KG retrieves the relevant entities and injects them into the system prompt — even if the original conversation was weeks ago and long since summarized.
Entity (node) Relation (edge) Fact (attribute)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ id │ │ source_id ───┼──→ Entity │ entity_id ───┼──→ Entity
│ name │ │ target_id ───┼──→ Entity │ key │
│ type │ │ predicate │ │ value │
│ summary │ │ weight │ │ value_type │
│ embedding │ (BLOB) │ embedding │ (BLOB) │ valid_at │
│ temperature │ │ confidence │ │ expired_at │
│ access_count │ └──────────────┘ └──────────────┘
│ last_accessed│
└──────────────┘
Entity types: PERSON, PLACE, ORG, EVENT, CONCEPT, DATE.
Temperature (Ebbinghaus memory decay): entities are classified as hot (recently accessed), warm (moderate), cool (aging), or cold (candidates for deactivation). The decay follows an exponential retention curve: R = e^(-t/S) where stability increases with access frequency.
Bi-temporal facts: when a fact changes (e.g., the user moves to a new address), the old value is expired (expired_at timestamp) and the new one inserted. The KG preserves full history — it knows both the current and previous values.
After each conversation turn, the KG extraction runs asynchronously (fire-and-forget, never blocks the chat):
User message + Assistant response
│
▼
┌─────────────────────────────┐
│ 1. LLM Entity Extraction │ ← Structured JSON extraction prompt
│ (entities, relations, │ (temperature: 0.1, max_tokens: 2048)
│ facts) │
└──────────────┬──────────────┘
▼
┌─────────────────────────────┐
│ 2. Entity Resolution │ ← Jaro-Winkler fuzzy matching (0.88 threshold)
│ (deduplicate against │ + FTS5 candidate search + alias table
│ existing entities) │
└──────────────┬──────────────┘
▼
┌─────────────────────────────┐
│ 3. Relation + Fact Storage │ ← Bi-temporal upsert (expire old, insert new)
│ (inside DB transaction) │ Unique triplet constraint (source, pred, target)
└──────────────┬──────────────┘
▼
┌─────────────────────────────┐
│ 4. Embedding Computation │ ← Batch API call to configured provider
│ (outside transaction) │ Text: "Name (TYPE): summary"
│ │ Stored as Float32 BLOB (~3 KB per entity)
└─────────────────────────────┘
Step 4 is optional — if no embedding provider is configured, entities are stored without vectors and the retrieval pipeline uses degraded scoring (text search only).
Before each LLM call, the KG is queried for relevant context. The pipeline fuses 4 independent signals:
User query: "Where do I live?"
│
▼
┌──────────────────────────────┐
│ 1. Query Expansion (LLM) │ "Where do I live?"
│ max_tokens: 50, temp: 0 │ → "address home residence habite domicile lieu"
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ 2a. FTS5 BM25 Search │ Text matching on entity names, summaries, facts
│ (entities + facts) │ → Signal 1: lexical relevance
├──────────────────────────────┤
│ 2b. Vector Similarity Search │ Embed query → cosine similarity with all entity
│ (cosine, threshold > 0.5) │ embeddings → Signal 2: semantic relevance
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ 3. Graph Neighbor Loading │ Load 2-hop neighbors from candidate entities
│ (2-hop subgraph) │ → Expands candidate pool via graph structure
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ 4. Spreading Activation │ BFS propagation from seed entities across
│ (decay 0.85, 4 hops) │ weighted edges → Signal 3: graph centrality
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ 5. Memory Decay (Ebbinghaus) │ Retention score based on last access time +
│ R = e^(-t/S) │ access frequency → Signal 4: recency/importance
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ 6. HybridScorer Fusion │ Normalizes + weighted sum of all 4 signals
│ │
│ Full mode (with embeddings): │ 0.30 BM25 + 0.30 vector + 0.25 activation + 0.15 decay
│ Degraded (no embeddings): │ 0.55 BM25 + 0.30 activation + 0.15 decay
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ 7. Top-K → System Prompt │ Entities + facts + relations formatted as XML
│ (reinforcement: touch) │ and injected into the LLM context
└──────────────────────────────┘
This hybrid approach means the KG can find relevant information even when the user's query uses different vocabulary than the stored data. For example, "Where do I live?" matches the fact address: 9 rue de la Paix through three complementary paths:
- Query expansion generates keywords like "address", "home", "domicile"
- Vector similarity bridges the semantic gap between "live" and "address"
- Graph activation boosts the User entity and its connected facts
Vector similarity search requires an embedding provider. DroidClaw supports three providers, all via remote API calls:
| Provider | Default Model | Free Tier | Dimensions |
|---|---|---|---|
| Gemini (recommended) | gemini-embedding-001 |
Generous free tier | 768 (up to 3072) |
| OpenAI | text-embedding-3-small |
No | 768 (up to 1536) |
| OpenRouter | openai/text-embedding-3-small |
No | 768 (up to 1536) |
By default, the embedding provider reuses the LLM provider's API key. A separate key can be configured if needed (e.g., using Anthropic for chat + Gemini for embeddings).
Configure in Settings > Embedding. Enable the Knowledge Graph in Settings > Knowledge.
AppConfig.tools.disabledTools (persisted in SharedPreferences)
↓
toolRegistryProvider (rebuilds on config change)
↓ only registers enabled tools
ToolRegistry
↓
ContextBuilder (system prompt lists available tools)
AgentLoop (sends tool definitions to LLM, executes tool calls)
- Flutter 3.38+
- Android SDK (API 24+)
flutter pub get
flutter analyze
flutter build apk --release --split-per-abiadb install build/app/outputs/flutter-apk/app-arm64-v8a-release.apk- Onboarding: choose your LLM provider (OpenRouter, Anthropic, OpenAI, Groq, Gemini)
- Enter the API key
- Test the connection
- Start chatting
DroidClaw requires API keys for the LLM provider and for some tools. All keys are stored securely on the device (FlutterSecureStorage). No key is ever sent to a third-party server.
You need one LLM provider key to use DroidClaw:
| Provider | Free Tier | Guide |
|---|---|---|
| OpenRouter (recommended) | Free models available, pay-as-you-go | Get key |
| Anthropic (Claude) | $5 trial credit | Get key |
| OpenAI (GPT) | $5 trial credit | Get key |
| Groq (Llama, Mixtral) | Generous free tier | Get key |
| Google Gemini | 15 req/min free | Get key |
These keys unlock specific tools. The agent works without them, but the corresponding tools will be unavailable.
| Service | Tool | Free Tier | Guide |
|---|---|---|---|
| Brave Search | web_search |
2,000 queries/month | Get key |
| OpenRouteService | get_directions |
2,000 req/day | Get key |
| SNCF | get_transit (national trains) |
5,000 req/day | Get key |
| PRIM / IDFM | get_transit (Ile-de-France) |
1,000 req/day | Get key |
The Knowledge Graph's vector search uses an embedding API. By default it reuses the LLM provider's API key — a separate key is only needed if you use a different embedding provider.
| Provider | Default Model | Free Tier | Dimensions |
|---|---|---|---|
| Gemini (recommended) | gemini-embedding-001 |
Generous free tier | 768 (up to 3072) |
| OpenAI | text-embedding-3-small |
No | 768 (up to 1536) |
| OpenRouter | openai/text-embedding-3-small |
No | 768 (up to 1536) |
Configure in Settings > Embedding. If no embedding provider is configured, the Knowledge Graph falls back to degraded mode (BM25 text search only — still functional, just less semantically precise).
| Service | Purpose | Guide |
|---|---|---|
| Telegram Bot | Remote access via Telegram | Get token |
These tools work out of the box, no configuration needed: web_scrape, web_scrape_js, file, get_location, get_address, subagent, message, clipboard, geocode, get_datetime, device_info, speak, open_app, set_alarm, notifications, contacts, calendar, ocr, qr_generate, pick_image, volume_control, weather, knowledge_search, knowledge_store, radio.
| Dart files | 111 |
| Analysis issues | 0 |
| APK size (arm64) | 39.5 MB |
| Languages | English, French, Spanish, German, Italian |
| Native code | Kotlin (AudioChannelPlugin — volume control, RadioPlaybackService — Media3 streaming) |
| minSdkVersion | 24 (Android 7.0) |
| targetSdkVersion | 34 (Android 14) |
DroidClaw is pronounced "ARaccoon" — The Raccoon. This is the name shown in the Android launcher and the project's mascot.
-
The claws: raccoons are famous for their extremely dexterous front paws, capable of manipulating objects, picking locks, and rummaging everywhere. They are the perfect embodiment of iterative tool calling — the agent that calls web_search, parses the results, follows up with web_scrape, extracts the info, and loops until it finds the answer.
-
Intelligence and resourcefulness: clever, adaptable, they always find a solution. Exactly what an AI agent does when it loops, fails, adjusts its strategy, and eventually solves the problem.
-
The nocturnal and discreet side: active at night, a bit "bandit-like" (in the cute sense). This evokes the off-grid nature of the application — everything runs locally on the phone, no data is sent to a central server, total privacy.

