Benchmark updates on duckdb by guillesd · Pull Request #202 · lance-format/lance-duckdb

guillesd · 2026-05-04T13:51:29Z

The following has been modified:

vector_exact: DuckDB does not really do well with vss_match, so we rather keep it with the same plain approach as parquet.
vector_index: This is the biggest one, HNSW was not really firing because of the WHERE clause (current limitation). It also only accepts one column in the ORDER BY clause, that's why sample_id is removed everywhere for consistency. This does not modify the performance of lance.
hybrid_search: The same performance optimization from vector_index applies, we needed HNSW to fire so we removed the extra column in ORDER BY.
blob_read: Most of the performance was regarding the table scan, which without an index on the join column it was not very optimizable for DuckDB. We changed that and instead of returning octet_length we now deserialize the whole image since it feels more inline with "blob_read". I don't mind if we dooctet_length but then what are we really proving with this query?

HNSW was also dormant because SET hnsw_enable_experimental_persistence = true was never set in the query session

Xuanwo · 2026-05-07T11:39:10Z

Thanks, this makes sense to me.

The original goal of this benchmark was to cover a few different retrieval shapes on the same dataset: lexical search, exact vector search, indexed vector search, hybrid search, and blob-heavy reads. Some of the original workloads were also intended to reflect filtered retrieval, not just best-case indexed latency.

That said, I agree with the direction here. In particular, it makes sense to avoid query shapes that prevent HNSW from firing, and I agree that the previous vector_exact path was not a good representation for DuckDB.

I’m supportive of merging this. We can follow up on our side later to clarify the workload definitions and benchmark notes if needed.

guillesd added 2 commits May 4, 2026 15:13

Performance optimizations for duckdb plus consistency

5e4ab90

Actually read blobs to measure deserialization performance

628d8b5

prrao87 requested a review from Xuanwo May 4, 2026 14:37

Xuanwo approved these changes May 7, 2026

View reviewed changes

Xuanwo changed the title ~~Benchmark updates~~ Benchmark updates on duckdb May 7, 2026

Xuanwo merged commit 5b1e3a4 into lance-format:main May 7, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark updates on duckdb#202

Benchmark updates on duckdb#202
Xuanwo merged 2 commits into
lance-format:mainfrom
guillesd:benchmark-updates

guillesd commented May 4, 2026 •

edited

Loading

Uh oh!

Xuanwo commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guillesd commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xuanwo commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guillesd commented May 4, 2026 •

edited

Loading