feat(lsm): expose lsm api to python#6259
feat(lsm): expose lsm api to python#6259zhangyue19921010 wants to merge 5 commits intolance-format:mainfrom
Conversation
PR Review: feat(lsm): expose lsm api to pythonOverall this is a well-structured PR with good test coverage and clean Python/Rust layering. A few issues to flag: P0: GIL not released in ExecutionPlanReader::next() In python/src/mem_wal.rs, the ExecutionPlanReader::next() implementation calls rt().spawn(None, ...), passing None instead of Some(py). This means the GIL is not released while the tokio future executes. Since next() is called from Python (which holds the GIL), this blocks the GIL for the entire duration of each batch fetch. If any downstream async work (or another Python thread) needs the GIL, this will deadlock. Every other async call site in this PR correctly uses rt().block_on(Some(py), ...) to release the GIL. However, next() does not have access to py since it implements Iterator. Consider restructuring, e.g. collecting batches eagerly in to_reader while holding py, or using pyo3-async-runtimes to properly yield the GIL during iteration. P1: put() collects all batches into memory The put() method in PyRegionWriter materializes the entire input stream into a Vec before writing. Per project guidelines, this can OOM on large writes. Consider streaming batches to the writer incrementally instead of collecting them all at once. Minor
|
All fixed. |
|
Hi @jackye1995 Sorry to bother you. Would u mind to take a look for this PR at your convenience? Really Appreciate! |
Expose LSM related API to Python
Writer:
Reader:
Optimize: