Skip to content

Add DeepSeek integration blog #3595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kolchfa-aws
Copy link
Collaborator

Closes #3587

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Copy link
Member

@nateynateynate nateynateynate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@nateynateynate nateynateynate merged commit 430d0d4 into opensearch-project:main Jan 30, 2025
5 checks passed
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Editorial review complete. Please see my comments and changes and let me know if you have any questions. Thanks!

redirect_from: '/authors/vikash/'
---

**Vikash Tiwari** is a Senior Software Development Engineer at Amazon Web Services, specializing in OpenSearch Vector Search. He is passionate about distributed systems, large-scale machine learning, scalable search architectures, and database internals. His expertise spans vector search, indexing optimizations, and efficient data retrieval, and he is deeply interested in learning and enhancing modern database systems.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Vikash Tiwari** is a Senior Software Development Engineer at Amazon Web Services, specializing in OpenSearch Vector Search. He is passionate about distributed systems, large-scale machine learning, scalable search architectures, and database internals. His expertise spans vector search, indexing optimizations, and efficient data retrieval, and he is deeply interested in learning and enhancing modern database systems.
**Vikash Tiwari** is a Senior Software Development Engineer at AWS specializing in OpenSearch vector search. He is passionate about distributed systems, large-scale machine learning, scalable search architectures, and database internals. His expertise spans vector search, indexing optimizations, and efficient data retrieval, and he is deeply interested in learning and enhancing modern database systems.


## RAG with OpenSearch and DeepSeek

The following diagram presents the RAG workflow in OpenSearch using the DeepSeek model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following diagram presents the RAG workflow in OpenSearch using the DeepSeek model.
The following diagram depicts the RAG workflow in OpenSearch using the DeepSeek model.

pip install opensearch-py transformers torch sentence-transformers
```

These packages form the backbone of our RAG system:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These packages form the backbone of our RAG system:
These packages form the backbone of the RAG system:


These packages form the backbone of our RAG system:

* `opensearch-py`: The official Python client for OpenSearch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `opensearch-py`: The official Python client for OpenSearch.
* `opensearch-py`: The official Python client for OpenSearch

These packages form the backbone of our RAG system:

* `opensearch-py`: The official Python client for OpenSearch.
* `transformers`: Hugging Face's library for working with transformer models.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `transformers`: Hugging Face's library for working with transformer models.
* `transformers`: Hugging Face's library for working with transformer models


In this step, you'll initialize two key components:

- The SentenceTransformer model for creating document embeddings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The SentenceTransformer model for creating document embeddings
- The SentenceTransformer model for creating document embeddings.

In this step, you'll initialize two key components:

- The SentenceTransformer model for creating document embeddings
- The DeepSeek model for generating responses. The MiniLM-L6-v2 model provides a good balance between performance and accuracy for embeddings.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The DeepSeek model for generating responses. The MiniLM-L6-v2 model provides a good balance between performance and accuracy for embeddings.
- The DeepSeek model for generating responses. The MiniLM-L6-v2 model provides a good balance between embedding performance and accuracy.

The RAG pipeline consists of two main functions:

* `index_document`: Converts text into embeddings and stores them in OpenSearch.
* `query_documents`: Performs similarity search using k-NN to find relevant documents. In this example, k-NN search returns the 3 most similar documents; you can adjust this number based on your needs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `query_documents`: Performs similarity search using k-NN to find relevant documents. In this example, k-NN search returns the 3 most similar documents; you can adjust this number based on your needs.
* `query_documents`: Performs similarity search using k-NN to find relevant documents. In this example, k-NN search returns the three most similar documents; you can adjust this number based on your needs.


## Try it out

The following complete one-click script contains all preceding steps. Save this script as `deepseek_rag.py`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following complete one-click script contains all preceding steps. Save this script as `deepseek_rag.py`:
The following complete one-click script contains all of the preceding steps. Save this script as `deepseek_rag.py`:


This simple RAG implementation combines the power of OpenSearch's vector search capabilities with DeepSeek's advanced language understanding. While this is a basic setup, it provides a foundation that you can build upon for more complex applications.

Key benefits of this implementation:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Key benefits of this implementation:
The following are some key benefits of this implementation:

@natebower
Copy link
Collaborator

@nateynateynate This should not be merged until @kolchfa-aws addresses editorial review. Thanks!

@nateynateynate
Copy link
Member

Shit. Sorry, I'll revert.

@nateynateynate
Copy link
Member

@kolchfa-aws - My apologies. You might have to file another PR to get this completed. I can't seem to find a button to reopen it.

@kolchfa-aws kolchfa-aws mentioned this pull request Jan 30, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BLOG] Zero to RAG: A Quick OpenSearch & DeepSeek Integration Guide
3 participants