feat: add persistent embedding daemon to eliminate cold-start latency#6
Open
raoabinav wants to merge 5 commits into
Open
feat: add persistent embedding daemon to eliminate cold-start latency#6raoabinav wants to merge 5 commits into
raoabinav wants to merge 5 commits into
Conversation
…yichuan-w#166) Adds `leann serve` command that starts a background embedding server daemon, keeping the model warm between searches. Reduces first-search latency from 30-60s to near-zero by avoiding repeated model loads. - New `embedding_daemon.py` with daemon lifecycle management (start/stop/status) - Heartbeat-based health monitoring with stale state cleanup - EmbeddingServerManager auto-detects running daemon before spawning new servers - CLI: `leann serve`, `leann serve --stop`, `leann serve --status` - 18 unit tests covering state management, integration, and CLI https://claude.ai/code/session_01M6abMs1YzF6yhh13YerDPT
…aemon - Remove sys.exit() from signal handler to prevent SystemExit during arbitrary code; use shutdown flag instead for clean exit - Redirect daemon subprocess stderr to ~/.leann/daemon.log instead of DEVNULL so startup failures can be diagnosed - Include log file path in error messages when daemon fails to start https://claude.ai/code/session_01M6abMs1YzF6yhh13YerDPT
The log_fh was opened and passed to subprocess.Popen but the parent process never closed its copy, leaking a file descriptor. https://claude.ai/code/session_01M6abMs1YzF6yhh13YerDPT
The daemon started the embedding server without --passages-file, which meant recompute mode (HNSW needs to resolve passage IDs during graph construction) would silently fail when search went through the daemon. Thread passages_file through: - run_daemon() → _start_background() / _run_foreground() - _run_foreground() → manager.start_server(passages_file=...) - CLI --passages-file arg for python -m leann.embedding_daemon - Stored in daemon.json state so clients can verify compatibility https://claude.ai/code/session_01M6abMs1YzF6yhh13YerDPT
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes yichuan-w#166
What changed and why:
leann servecommand that starts a long-running background process keeping the embedding model loaded in memory. The firstleann searchafter boot normally takes 10-30s (model load + tokenizer init). With the daemon pre-warmed, it's < 100ms.--foreground,--stop,--status. Writes a PID file to~/.leann/daemon.pidand state to~/.leann/daemon.json.--stopsends SIGTERM and cleans up the PID file.--statusreads the state file and checks if the process is still alive.EmbeddingServerManager.connect_to_daemon()added — before starting a new server subprocess, the manager now checks if a daemon is already running on the expected port and reuses it. This is the key integration point: existingleann searchcommands transparently benefit from the daemon without any code changes in the search path.