Skip to content

perf: lazy-import torch and tiktoken in embedding_compute / chat#323

Open
raoabinav wants to merge 1 commit into
yichuan-w:mainfrom
raoabinav:perf/lazy-import-torch
Open

perf: lazy-import torch and tiktoken in embedding_compute / chat#323
raoabinav wants to merge 1 commit into
yichuan-w:mainfrom
raoabinav:perf/lazy-import-torch

Conversation

@raoabinav
Copy link
Copy Markdown
Contributor

@raoabinav raoabinav commented May 20, 2026

embedding_compute.py and chat.py import torch / tiktoken at module top, so import leann pulls torch up front even for callers that just do MCP search or BM25 lookups. Moved both into the functions that actually use them so its lazy loaded

embedding_compute.py:14-16 and chat.py:13 import torch / tiktoken at
module top, which means `import leann` pulls ~1 GB of torch state even
for callers that only do MCP search over a prebuilt index, BM25-only
queries, or other paths that never touch the embedding pipeline.

Moved torch into the two functions that actually use it
(compute_embeddings_sentence_transformers, HFLLM.ask). The lazy imports
in HFLLM.__init__ and compute_embeddings_ollama were already
function-local, so they're unchanged. Moved tiktoken into
truncate_to_token_limit.

`import leann` drops from ~6700ms to ~128ms locally; torch and tiktoken
stay out of sys.modules until first real use.

I'm assuming the eager imports were just convenience and not load-bearing
in any way I'm missing (e.g. catching ImportError up-front for a clearer
error message). Happy to revisit if there's a reason they need to be
loaded early.

I didn't find an existing issue for this — happy to open one if you'd
prefer that path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant