-
Notifications
You must be signed in to change notification settings - Fork 486
Add DeepSeek integration blog #3595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DeepSeek integration blog #3595
Conversation
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Editorial review complete. Please see my comments and changes and let me know if you have any questions. Thanks!
redirect_from: '/authors/vikash/' | ||
--- | ||
|
||
**Vikash Tiwari** is a Senior Software Development Engineer at Amazon Web Services, specializing in OpenSearch Vector Search. He is passionate about distributed systems, large-scale machine learning, scalable search architectures, and database internals. His expertise spans vector search, indexing optimizations, and efficient data retrieval, and he is deeply interested in learning and enhancing modern database systems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Vikash Tiwari** is a Senior Software Development Engineer at Amazon Web Services, specializing in OpenSearch Vector Search. He is passionate about distributed systems, large-scale machine learning, scalable search architectures, and database internals. His expertise spans vector search, indexing optimizations, and efficient data retrieval, and he is deeply interested in learning and enhancing modern database systems. | |
**Vikash Tiwari** is a Senior Software Development Engineer at AWS specializing in OpenSearch vector search. He is passionate about distributed systems, large-scale machine learning, scalable search architectures, and database internals. His expertise spans vector search, indexing optimizations, and efficient data retrieval, and he is deeply interested in learning and enhancing modern database systems. |
|
||
## RAG with OpenSearch and DeepSeek | ||
|
||
The following diagram presents the RAG workflow in OpenSearch using the DeepSeek model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following diagram presents the RAG workflow in OpenSearch using the DeepSeek model. | |
The following diagram depicts the RAG workflow in OpenSearch using the DeepSeek model. |
pip install opensearch-py transformers torch sentence-transformers | ||
``` | ||
|
||
These packages form the backbone of our RAG system: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These packages form the backbone of our RAG system: | |
These packages form the backbone of the RAG system: |
|
||
These packages form the backbone of our RAG system: | ||
|
||
* `opensearch-py`: The official Python client for OpenSearch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `opensearch-py`: The official Python client for OpenSearch. | |
* `opensearch-py`: The official Python client for OpenSearch |
These packages form the backbone of our RAG system: | ||
|
||
* `opensearch-py`: The official Python client for OpenSearch. | ||
* `transformers`: Hugging Face's library for working with transformer models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `transformers`: Hugging Face's library for working with transformer models. | |
* `transformers`: Hugging Face's library for working with transformer models |
|
||
In this step, you'll initialize two key components: | ||
|
||
- The SentenceTransformer model for creating document embeddings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The SentenceTransformer model for creating document embeddings | |
- The SentenceTransformer model for creating document embeddings. |
In this step, you'll initialize two key components: | ||
|
||
- The SentenceTransformer model for creating document embeddings | ||
- The DeepSeek model for generating responses. The MiniLM-L6-v2 model provides a good balance between performance and accuracy for embeddings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The DeepSeek model for generating responses. The MiniLM-L6-v2 model provides a good balance between performance and accuracy for embeddings. | |
- The DeepSeek model for generating responses. The MiniLM-L6-v2 model provides a good balance between embedding performance and accuracy. |
The RAG pipeline consists of two main functions: | ||
|
||
* `index_document`: Converts text into embeddings and stores them in OpenSearch. | ||
* `query_documents`: Performs similarity search using k-NN to find relevant documents. In this example, k-NN search returns the 3 most similar documents; you can adjust this number based on your needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `query_documents`: Performs similarity search using k-NN to find relevant documents. In this example, k-NN search returns the 3 most similar documents; you can adjust this number based on your needs. | |
* `query_documents`: Performs similarity search using k-NN to find relevant documents. In this example, k-NN search returns the three most similar documents; you can adjust this number based on your needs. |
|
||
## Try it out | ||
|
||
The following complete one-click script contains all preceding steps. Save this script as `deepseek_rag.py`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following complete one-click script contains all preceding steps. Save this script as `deepseek_rag.py`: | |
The following complete one-click script contains all of the preceding steps. Save this script as `deepseek_rag.py`: |
|
||
This simple RAG implementation combines the power of OpenSearch's vector search capabilities with DeepSeek's advanced language understanding. While this is a basic setup, it provides a foundation that you can build upon for more complex applications. | ||
|
||
Key benefits of this implementation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key benefits of this implementation: | |
The following are some key benefits of this implementation: |
@nateynateynate This should not be merged until @kolchfa-aws addresses editorial review. Thanks! |
Shit. Sorry, I'll revert. |
@kolchfa-aws - My apologies. You might have to file another PR to get this completed. I can't seem to find a button to reopen it. |
Closes #3587
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.