Fix cold-start latency + ZMQ reliability by raoabinav · Pull Request #1 · raoabinav/LEANN

raoabinav · 2026-02-13T02:29:33Z

What changed and why:

Removed redundant _ensure_server_running call inside compute_query_embedding. Previously, every query embedding triggered a second server health check even though the caller (api.py) already called _ensure_server_running. This was the main source of cold-start latency â€” a double server startup on first query. Now compute_query_embedding trusts the port it receives from the caller, eliminating the redundant check.
Added ZMQ retry with exponential backoff (0.5s, 1s, 2s) in _compute_embedding_via_server. The original code did a single ZMQ send/recv with no retry â€” any transient connection failure (server still loading, socket not yet bound) was a hard crash. Now retries up to 3 times with proper socket teardown between attempts.
Set ZMQ.SNDTIMEO=10s and ZMQ.LINGER=0. Without SNDTIMEO, a dead server caused an indefinite hang on socket.send(). Without LINGER=0, socket.close() would block waiting for unsent messages, stalling the retry loop.
enable_warmup is now a kwargs pop, not a forward. It was being passed through to start_server() which didn't understand it. Now it's consumed by _ensure_server_running to fire a dummy embedding request after server start, pre-loading the model into GPU memory before the first real query hits.
Replaced print() with logging.getLogger(__name__) in searcher_base. The original used bare print("âš ï¸� ...") for error reporting, which is invisible in production and breaks structured logging pipelines.
Added timing instrumentation across the entire server lifecycle â€” start_server, _start_new_server, _ensure_server_running, compute_query_embedding all log elapsed time so you can actually profile where cold-start time goes.

- Add ColQwenRAG class with easy-to-use CLI for multimodal PDF retrieval - Support for both ColQwen2 and ColPali models with automatic device selection - MPS optimization for Apple Silicon with memory-efficient loading - Complete pipeline: PDF→images→embeddings→HNSW index→search - Multi-vector indexing for fine-grained document matching - Comprehensive user guide and reproduction test script - Resolves yichuan-w#119: ColQwen Doc and Support Management Features: - python -m apps.colqwen_rag build --pdfs ./pdfs/ --index my_index - python -m apps.colqwen_rag search my_index "query text" - python -m apps.colqwen_rag ask my_index --interactive - Automatic CPU fallback for memory constraints - Robust error handling and progress tracking

- Add noqa comments for E402 errors (imports after sys.path modifications) - Remove unused variable assignment in colqwen_rag.py - Use importlib.util.find_spec for dependency checks instead of unused imports - Fix import ordering in test_colqwen_reproduction.py

- Add apps/image_rag.py for indexing and searching images using CLIP embeddings - Supports text-based image search queries - Uses CLIP ViT-L/14 model via sentence-transformers - Follows the same pattern as other RAG apps in the apps directory - Addresses feature request for CLIP support in apps (issue yichuan-w#94)

…ichuan-w#179) Fixes yichuan-w#175 Problem: When --file-types .pdf is specified, PDFs were being processed twice: 1. Separately with PyMuPDF/pdfplumber extractors 2. Again in the 'other file types' section via SimpleDirectoryReader This caused duplicate processing and potential conflicts. Solution: - Exclude .pdf from other_file_extensions when PDFs are already processed separately - Only load other file types if there are extensions to process - Prevents duplicate PDF processing Changes: - Added logic to filter out .pdf from code_extensions when loading other file types if PDFs were processed separately - Updated SimpleDirectoryReader to use filtered extensions - Added check to skip loading if no other extensions to process

…r multi-vector… (yichuan-w#161) * Add timing instrumentation and multi-dataset support for multi-vector retrieval - Add timing measurements for search operations (load and core time) - Increase embedding batch size from 1 to 32 for better performance - Add explicit memory cleanup with del all_embeddings - Support loading and merging multiple datasets with different splits - Add CLI arguments for search method selection (ann/exact/exact-all) - Auto-detect image field names across different dataset structures - Print candidate doc counts for performance monitoring 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * update vidore * reproduce docvqa results * reproduce docvqa results and add debug file --------- Co-authored-by: Claude <noreply@anthropic.com>

…pport fo…" (yichuan-w#180) This reverts commit 00770ae.

* Add timing instrumentation and multi-dataset support for multi-vector retrieval - Add timing measurements for search operations (load and core time) - Increase embedding batch size from 1 to 32 for better performance - Add explicit memory cleanup with del all_embeddings - Support loading and merging multiple datasets with different splits - Add CLI arguments for search method selection (ann/exact/exact-all) - Auto-detect image field names across different dataset structures - Print candidate doc counts for performance monitoring 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * update vidore * reproduce docvqa results * reproduce docvqa results and add debug file * fix: format colqwen_forward.py to pass pre-commit checks --------- Co-authored-by: Claude <noreply@anthropic.com>

Reset faiss submodule to match main branch to avoid unnecessary changes

- Add ColQwen2.5 and ColQwen2_5_Processor imports - Implement smart model type detection for colqwen2, colqwen2.5, and colpali - Add task name aliases for easier benchmark invocation - Add safe model name handling for file paths and index naming - Support custom model paths including LoRA adapters - Improve model choice validation and error handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>

Add brief introduction and usage guide for ColQwen integration, similar to other RAG application sections in the README. - Quick start examples for building, searching, and interactive Q&A - Setup instructions with prerequisites - Model options (ColQwen2 vs ColPali) - Link to detailed ColQwen guide

Add COLQWEN_GUIDE.md to docs/ directory for proper documentation structure. This file is referenced in the README and needs to be tracked in git.

Signed-off-by: droctothorpe <mythicalsunlight@gmail.com>

* Add Anthropic LLM support Signed-off-by: droctothorpe <mythicalsunlight@gmail.com> * Update skypilot link Signed-off-by: droctothorpe <mythicalsunlight@gmail.com> * Handle anthropic base_url Signed-off-by: droctothorpe <mythicalsunlight@gmail.com> * Address ruff format finding Signed-off-by: droctothorpe <mythicalsunlight@gmail.com> --------- Signed-off-by: droctothorpe <mythicalsunlight@gmail.com>

yichuan-w#188) * Add custom folder support and improve image loading for multi-vector retrieval - Enhanced _load_images_from_dir with recursive search support and better error handling - Added support for WebP format and RGB conversion for all image modes - Added custom folder CLI arguments (--custom-folder, --recursive, --rebuild-index) - Improved documentation and removed completed TODO comment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Format code style in leann_multi_vector.py for better readability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

…ration add ColQwen multimodal PDF retrieval integration

…w#189) * Add custom folder support and improve image loading for multi-vector retrieval - Enhanced _load_images_from_dir with recursive search support and better error handling - Added support for WebP format and RGB conversion for all image modes - Added custom folder CLI arguments (--custom-folder, --recursive, --rebuild-index) - Improved documentation and removed completed TODO comment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Format code style in leann_multi_vector.py for better readability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * docs: polish README performance tip section - Fix typo: 'matrilize' -> 'materialize' - Improve clarity and formatting of --no-recompute flag explanation - Add code block for better readability * format --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

…huan-w#157) PR yichuan-w#157 changed create_text_chunks() to return list[dict] instead of list[str] to preserve metadata, but base_rag_example.py was not updated to handle the new format. This caused all chunks to fail validation with "All provided chunks are empty or invalid".

* Drop Python 3.9 support, require Python 3.10+ Python 3.9 reached end-of-life and the codebase uses PEP 604 union type syntax (str | None) which requires Python 3.10+. Changes: - Remove Python 3.9 from CI build matrix - Update requires-python to >=3.10 in all pyproject.toml files - Update classifiers to reflect supported Python versions (3.10-3.13) * Remove macos-13 from CI build matrix GitHub Actions deprecated macos-13 runner (brownout started Sept 2025, fully retired Dec 2025). See: https://github.blog/changelog/2025-09-19-github-actions-macos-13-runner-image-is-closing-down/ * Add macos-15-large for Intel Mac builds Replace deprecated macos-13 with macos-15-large (x86_64 Intel) to continue supporting Intel Mac users. * Set MACOSX_DEPLOYMENT_TARGET=13.x for Intel builds Intel Mac wheels (macos-15-large) now target macOS 13.0/13.3 for backward compatibility, allowing macOS 13/14/15 Intel users to install pre-built wheels. * Remove Intel Mac builds (macos-15-large requires paid plan) Intel Mac users can build from source. This avoids: - Paid GitHub Actions runners (macos-15-large) - Complex cross-compilation setup * Add macos-15-intel for Intel Mac builds (free runner) Use macos-15-intel (free standard runner) instead of macos-15-large (paid). This provides Intel Mac wheel support until Aug 2027. - MACOSX_DEPLOYMENT_TARGET=13.0 for backward compatibility - Replaces deprecated macos-13 runner * Add macOS 26 (beta) to build matrix Add macos-26 (arm64) runner to the build matrix for testing future macOS compatibility. This is currently a beta runner that helps ensure wheels work on upcoming macOS versions. * Fix macos-15-intel deployment target The macos-15-intel runner runs macOS 15.7, so Homebrew libraries are built for macOS 14+. Setting MACOSX_DEPLOYMENT_TARGET=13.0 causes delocate to fail because system libraries require newer macOS. Fix by setting deployment target to 15.0 for macos-15-intel, matching the actual OS version. Intel Mac users will need macOS 15+. * Exclude macos-15-intel + Python 3.13 (no PyTorch wheels available)

…uan-w#157) (yichuan-w#192) * Add ty type checker to CI and fix type errors - Add ty (Astral's fast Python type checker) to GitHub CI workflow - Fix type annotations across all RAG apps: - Update load_data return types from list[str] to list[dict[str, Any]] - Fix base_rag_example.py to properly handle dict format from create_text_chunks - Fix type errors in leann-core: - chunking_utils.py: Add explicit type annotations - cli.py: Fix return type annotations for PDF extraction functions - interactive_utils.py: Fix readline import type handling - Fix type errors in apps: - wechat_history.py: Fix return type annotations - document_rag.py, code_rag.py: Replace **kwargs with explicit arguments - Add ty configuration to pyproject.toml This resolves the bug introduced in PR yichuan-w#157 where create_text_chunks() changed to return list[dict] but callers were not updated. * Fix remaining ty type errors - Fix slack_mcp_reader.py channel parameter can be None - Fix embedding_compute.py ContextProp type issue - Fix searcher_base.py method override signatures - Fix chunking_utils.py chunk_text assignment - Fix slack_rag.py and twitter_rag.py return types - Fix email.py and image_rag.py method overrides * Fix multimodal benchmark scripts type errors - Fix undefined LeannRetriever -> LeannMultiVector - Add proper type casts for HuggingFace Dataset iteration - Cast task config values to correct types - Add type annotations for dataset row dicts * Enable ty check for multimodal scripts in CI All type errors in multimodal scripts have been fixed, so we can now include them in the CI type checking. * Fix all test type errors and enable ty check on tests - Fix test_basic.py: search() takes str not list - Fix test_cli_prompt_template.py: add type: ignore for Mock assignments - Fix test_prompt_template_persistence.py: match BaseSearcher.search signature - Fix test_prompt_template_e2e.py: add type narrowing asserts after skip - Fix test_readme_examples.py: use explicit kwargs instead of **model_args - Fix metadata_filter.py: allow Optional[MetadataFilters] - Update CI to run ty check on tests * Format code with ruff * Format searcher_base.py

Thanks for the contribution! 🎉

* Add prompt template feature to README Highlights performance optimization with task-specific prompt templates. Includes real-world benchmark data showing EmbeddingGemma 300M achieving 4-5x speed improvement over Qwen 600M while maintaining identical search quality. Per maintainer request to promote this feature in main README for better discoverability. * Fix typo: --embedding-prompt-template -> --query-prompt-template --------- Co-authored-by: Andy Lee <andylizf@outlook.com>

…#197) Thanks for the contribution! This is a nice improvement for better UX. 🎉

- Add Jina AI to the cloud providers table with (Embeddings) label - Add tip section explaining how to use separate embedding provider with --embedding-api-base and --embedding-api-key flags

- Add LEANN_EMBEDDING_DEVICE env var for embedding model GPU selection - Add LEANN_LLM_DEVICE env var for HFChat LLM GPU selection - When specific GPU (e.g., cuda:1) is set, use it exclusively - When set to "cuda" or unset, use device_map="auto" for multi-GPU - Document env vars in README Common Parameters section

- Add batch_size parameter support in provider_options/embedding_options - When user specifies batch_size, disable adaptive_optimization - Keep default Qwen3-Embedding batch_size (32) as fallback

Content already exists in docs/configuration-guide.md. The section was too prominent for an advanced feature and cluttered the README structure.

…e in README

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add comprehensive documentation for Claude Code instances working with this repository, including build commands, architecture overview, testing instructions, and key design patterns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Ensure ColQwenRAG always sets the processor when the model reloads on CPU due to memory constraints, preventing embed-time attribute errors. Co-authored-by: Cursor <cursoragent@cursor.com>

Fail fast with a clear error when transformers>=4.46 is installed, and delay colpali_engine imports until after the version check to avoid HybridCache import crashes. Co-authored-by: Cursor <cursoragent@cursor.com>

Clean up unused type suppression comments flagged by ty across apps and core packages. Co-authored-by: Cursor <cursoragent@cursor.com>

Normalize torch.compile call formatting after ruff. Co-authored-by: Cursor <cursoragent@cursor.com>

…eddings Fix/colqwen empty embeddings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ensure all index-* CLI commands accept embedding model/mode arguments to match their builder usage. Co-authored-by: Cursor <cursoragent@cursor.com>

…ive-ingestion-and-formats Comprehensive Data Ingestion & New Format Support

…#241)

… suppressed search output (yichuan-w#242) - Add `leann watch` CLI command that compares current files against the last Merkle tree checkpoint and reports added/removed/modified files with their associated chunk IDs. - Integrate FileSynchronizer into `leann build` to create initial snapshots and persist sync config (sync_roots.json). - Prepend line numbers to code file chunks (e.g. `42|def foo():`) so search results display exact line locations for code navigation. - Trim partial first lines in code chunks caused by character-based overlap to ensure every chunk starts at a clean line boundary. - Fix `suppress_cpp_output` swallowing Python print() along with C++ output by redirecting sys.stdout/sys.stderr to saved fd copies while OS-level fds go to /dev/null. - Update README with watch command documentation in Quick Start, Usage Examples, and Complete CLI Reference sections.

…yichuan-w#177) - Fix enable_warmup: pop kwarg in _ensure_server_running and send dummy embedding request after server starts (was previously passed to start_server as unused kwarg) - Remove redundant _ensure_server_running call in compute_query_embedding; caller (api.py) already ensures server is running before search - Add retry with exponential backoff (0.5s, 1s, 2s) to ZMQ client in _compute_embedding_via_server, with proper socket cleanup between attempts - Add SNDTIMEO (10s) and LINGER(0) to ZMQ sockets for clean failure - Add timing instrumentation throughout searcher_base.py and embedding_server_manager.py for diagnosing startup and query latency - Add tests/test_cold_start.py with 13 unit tests https://claude.ai/code/session_01M6abMs1YzF6yhh13YerDPT

replace all print("[leann] ...") calls with proper logger.info/warning so timing output respects LEANN_LOG_LEVEL and doesn't pollute stdout. https://claude.ai/code/session_01M6abMs1YzF6yhh13YerDPT

…stence check (yichuan-w#245) - Remove erroneous first-line trimming in create_ast_chunks that stripped function signatures (e.g. `def hello():`) by assuming all chunks have line-number prefixes starting with digits - Move line number prepending from before AST chunking to after, so the AST parser receives valid source code instead of `1|def hello():` which breaks syntax tree parsing and causes fallback to naive text splitting - Fix index existence check in base_rag_example.py to look for the actual .meta.json file instead of just the directory (empty temp dirs always exist) Co-authored-by: Cursor <cursoragent@cursor.com>

- Add zmq_port is not None guard before server path (ty invalid-argument-type) - Fix _TestSearcher.search override to match interface signature (ty invalid-method-override) - Put logger.error/logger.warning on single lines (ruff format) Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…ld-start-LONFb

SuperPauly · 2026-03-05T23:18:44Z

This PR is rather large, having changed 78 files had 92 commits and going on for 4 months.

Merging in to Main any time soon?

ASuresh0524 and others added 30 commits November 10, 2025 13:31

docs: survey

eb909cc

Revert "[Multi-vector]Add timing instrumentation and multi-dataset su…

d599566

…pport fo…" (yichuan-w#180) This reverts commit 00770ae.

Revert unnecessary faiss submodule update

86287d8

Reset faiss submodule to match main branch to avoid unnecessary changes

fix: Update ColQwen guide link to docs/ directory

af47dfd

docs: Add ColQwen guide to docs directory

0175bc9

Add COLQWEN_GUIDE.md to docs/ directory for proper documentation structure. This file is referenced in the README and needs to be tracked in git.

Use logger instead of print (yichuan-w#186)

3629ccf

Signed-off-by: droctothorpe <mythicalsunlight@gmail.com>

Merge pull request yichuan-w#162 from yichuan-w/feature/colqwen-integ…

d1b3c93

…ration add ColQwen multimodal PDF retrieval integration

Move COLQWEN_GUIDE.md to docs and remove test_colqwen_reproduction.py

a1c21ad

docs: Update repository URL in CONTRIBUTING.md (yichuan-w#200)

1ca0d3f

Thanks for the contribution! 🎉

feat: add configurable verbosity for FAISS/HNSW C++ output (yichuan-w…

ef475e9

…#197) Thanks for the contribution! This is a nice improvement for better UX. 🎉

docs: add Jina AI as OpenAI-compatible embedding provider

22ca2da

- Add Jina AI to the cloud providers table with (Embeddings) label - Add tip section explaining how to use separate embedding provider with --embedding-api-base and --embedding-api-key flags

feat: allow batch_size override via provider_options (yichuan-w#205)

733076a

- Add batch_size parameter support in provider_options/embedding_options - When user specifies batch_size, disable adaptive_optimization - Keep default Qwen3-Embedding batch_size (32) as fallback

docs: remove redundant prompt template section from README

dd38a78

Content already exists in docs/configuration-guide.md. The section was too prominent for an advanced feature and cluttered the README structure.

feat: add hybrid search

b2e1f5a

tolgakaratas and others added 29 commits January 26, 2026 03:39

docs: restore value propositions and enhance feature list in README

e2b1a57

docs: restore core value propositions and Claude Code integration not…

becb1dc

…e in README

docs: Add Trendshift trending badge to README

6fe787d

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Update Slack invitation link

939f049

Fix ColQwen build crash when no images are extracted

f4ffd86

Fix ColQwen build crash when no images are extracted (yichuan-w#230)

6013237

docs: Add CLAUDE.md for Claude Code guidance

98d7193

Add comprehensive documentation for Claude Code instances working with this repository, including build commands, architecture overview, testing instructions, and key design patterns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: initialize processor in fallback path

f09e77f

Ensure ColQwenRAG always sets the processor when the model reloads on CPU due to memory constraints, preventing embed-time attribute errors. Co-authored-by: Cursor <cursoragent@cursor.com>

fix: guard against unsupported transformers

aa9864c

Fail fast with a clear error when transformers>=4.46 is installed, and delay colpali_engine imports until after the version check to avoid HybridCache import crashes. Co-authored-by: Cursor <cursoragent@cursor.com>

Merge branch 'main' into fix/colqwen-empty-embeddings

77f6299

chore: remove unused type ignores

6734011

Clean up unused type suppression comments flagged by ty across apps and core packages. Co-authored-by: Cursor <cursoragent@cursor.com>

chore: apply ruff format

347e31f

Normalize torch.compile call formatting after ruff. Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request yichuan-w#235 from yichuan-w/fix/colqwen-empty-emb…

e4bf950

…eddings Fix/colqwen empty embeddings

docs: Add prominent Slack community invitation to README

29fbb19

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: fixing some command in the contribution page (yichuan-w#238)

8289419

Relax transformers pin for Py>=3.10 (yichuan-w#240)

6559c6b

fix: add embedding args for index commands

4cdc622

Ensure all index-* CLI commands accept embedding model/mode arguments to match their builder usage. Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request yichuan-w#227 from tolgakaratas/feature/comprehens…

2c79621

…ive-ingestion-and-formats Comprehensive Data Ingestion & New Format Support

Revert "Comprehensive Data Ingestion & New Format Support" (yichuan-w…

b7eaeac

…#241)

chore: release v0.3.7

67f61e2

use logger instead of print in searcher_base (yichuan-w#177)

71df147

replace all print("[leann] ...") calls with proper logger.info/warning so timing output respects LEANN_LOG_LEVEL and doesn't pollute stdout. https://claude.ai/code/session_01M6abMs1YzF6yhh13YerDPT

fix: apply ruff check and format for CI lint

16f4064

Co-authored-by: Cursor <cursoragent@cursor.com>

fix: apply ruff check and format for CI

ef8af8d

Merge remote-tracking branch 'upstream/main' into abinav/issue-177-co…

1cc6817

…ld-start-LONFb

fix(ci): apply ruff check and format

4bbd811

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cold-start latency + ZMQ reliability#1

Fix cold-start latency + ZMQ reliability#1
raoabinav wants to merge 92 commits into
mainfrom
abinav/issue-177-cold-start-LONFb

raoabinav commented Feb 13, 2026

Uh oh!

SuperPauly commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Conversation

raoabinav commented Feb 13, 2026

Uh oh!

SuperPauly commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants