Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand ANNLite capabilities with BM25 to build Hybrid Search #19

Open
Nick17t opened this issue Feb 24, 2023 · 5 comments
Open

Expand ANNLite capabilities with BM25 to build Hybrid Search #19

Nick17t opened this issue Feb 24, 2023 · 5 comments
Labels

Comments

@Nick17t
Copy link
Contributor

Nick17t commented Feb 24, 2023

Project idea 4: Expand ANNLite capabilities with BM25 to build Hybrid Search

info details
Skills needed Python, C++, Lucene, ANN, Inverted Index
Project size 350 hours
Difficulty level Hard
Mentors @Felix Wang @Joan Martínez @Girish Chandrashekar

Project Description

  • In relation to Research about deploying LLM with Jina project, another interesting approach would be to incorporate BM25 and Hybrid Search into ANNLite, which would enable Jina to build scalable Hybrid Search solutions in the cloud with a powerful default solution.
  • ANNlite is a Vector search library developed by Jina which is using HNSW as the algorithm to perform a search. On top of this, it allows the filtering of Documents.
  • However, it can be important for the performance of search systems to be able to combine Vector Search algorithms with traditional text-search ones to get the best of both worlds.
  • This project is about evaluating and trying to apply Hybrid Search approaches on top of ANNLite.

Resources:

Expected outcomes

  • ANNLite is ready to be used as a default library to solve Hybrid Search applications.
@Nick17t Nick17t added the ideas label Feb 24, 2023
@JoanFM
Copy link
Member

JoanFM commented Feb 27, 2023

ANNlite is a Vector search library developed by Jina which is using HNSW as the algorithm to perform search. On top of this it allows to do filtering on Documents.

However, it can be important for the performance of search systems to be able to combine Vector Search algorithms with traditional text-search ones to get the best of both worlds.

This project is about evaluating and trying to apply Hybrid Search approaches on top of ANNLite.

Resources:

  • ANNLite github [https://github.com/jina-ai/annlite]
  • BM25 [https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables]
  • HNSW [https://github.com/naver/splade]
  • Splade [https://github.com/naver/splade]

@matchyc
Copy link

matchyc commented Mar 3, 2023

Hi, Michael here. I'm familiar with ANNS, including various graph-based indexes (NSG, HNSW, Vamana, etc.), and a contributor to the Milvus community (advanced vector database project). I'm trying to understand the details in ANNLite. I will deliver my proposal draft as soon as possible.

Just a few concerns: Does this project need to integrate models to generate sparse vectors? It means that we only need to focus on hybrid search (maybe hybrid index construction) not how the input vectors (dense or sparse) are produced, am I correct?

@JoanFM
Copy link
Member

JoanFM commented Mar 3, 2023

Hi, Michael here. I'm familiar with ANNS, including various graph-based indexes (NSG, HNSW, Vamana, etc.), and a contributor to the Milvus community (advanced vector database project). I'm trying to understand the details in ANNLite. I will deliver my proposal draft as soon as possible.

Just a few concerns: Does this project need to integrate models to generate sparse vectors? It means that we only need to focus on hybrid search (maybe hybrid index construction) not how the input vectors (dense or sparse) are produced, am I correct?

It is correct, it should not care about how to create them at the beginning at least

@matchyc
Copy link

matchyc commented Mar 29, 2023

I'm going to submit the proposal with a detailed framework design, but should I talk to mentors before submitting it?

@Nasafato
Copy link

I think you can submit first. I just submitted for this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants