Add DeepSeek integration blog #3595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

nateynateynate merged 2 commits into opensearch-project:main from kolchfa-aws:deepseek-integration

Jan 30, 2025

Collaborator

kolchfa-aws commented Jan 30, 2025

Closes #3587

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.


          Add DeepSeek integration blog

8fb42b1

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws requested review from elfisher, AMoo-Miki, nknize, krisfreedain, peterzhuamazon, CEHENKLE, dtaivpp, nateynateynate and natebower as code owners

January 30, 2025 18:13


          Add meta

98db096

Signed-off-by: Fanit Kolchina <[email protected]>

nateynateynate approved these changes

View reviewed changes

Member

nateynateynate left a comment

Looks good!

nateynateynate merged commit 430d0d4 into opensearch-project:main

5 checks passed

natebower reviewed

View reviewed changes

Collaborator

natebower left a comment

@kolchfa-aws Editorial review complete. Please see my comments and changes and let me know if you have any questions. Thanks!

_community_members/vikash.md

+              redirect_from: '/authors/vikash/'
+              ---
+              **Vikash Tiwari** is a Senior Software Development Engineer at Amazon Web Services, specializing in OpenSearch Vector Search. He is passionate about distributed systems, large-scale machine learning, scalable search architectures, and database internals. His expertise spans vector search, indexing optimizations, and efficient data retrieval, and he is deeply interested in learning and enhancing modern database systems.

Collaborator

natebower Jan 30, 2025

Suggested change

      
            **Vikash Tiwari** is a Senior Software Development Engineer at Amazon Web Services, specializing in OpenSearch Vector Search. He is passionate about distributed systems, large-scale machine learning, scalable search architectures, and database internals. His expertise spans vector search, indexing optimizations, and efficient data retrieval, and he is deeply interested in learning and enhancing modern database systems.
          
            **Vikash Tiwari** is a Senior Software Development Engineer at AWS specializing in OpenSearch vector search. He is passionate about distributed systems, large-scale machine learning, scalable search architectures, and database internals. His expertise spans vector search, indexing optimizations, and efficient data retrieval, and he is deeply interested in learning and enhancing modern database systems.

_posts/2025-01-30-deepseek-integration-rag.md


		## RAG with OpenSearch and DeepSeek

		The following diagram presents the RAG workflow in OpenSearch using the DeepSeek model.

Collaborator

natebower Jan 30, 2025

Suggested change

      
            The following diagram presents the RAG workflow in OpenSearch using the DeepSeek model.
          
            The following diagram depicts the RAG workflow in OpenSearch using the DeepSeek model.

_posts/2025-01-30-deepseek-integration-rag.md

+              pip install opensearch-py transformers torch sentence-transformers
+              ```
+              These packages form the backbone of our RAG system:

Collaborator

natebower Jan 30, 2025

Suggested change

      
            These packages form the backbone of our RAG system:
          
            These packages form the backbone of the RAG system:

_posts/2025-01-30-deepseek-integration-rag.md


		These packages form the backbone of our RAG system:

		* `opensearch-py`: The official Python client for OpenSearch.

Collaborator

natebower Jan 30, 2025

Suggested change

      
            * `opensearch-py`: The official Python client for OpenSearch.
          
            * `opensearch-py`: The official Python client for OpenSearch

_posts/2025-01-30-deepseek-integration-rag.md

+              These packages form the backbone of our RAG system:
+              * `opensearch-py`: The official Python client for OpenSearch.
+              * `transformers`: Hugging Face's library for working with transformer models.

Collaborator

natebower Jan 30, 2025

Suggested change

      
            * `transformers`: Hugging Face's library for working with transformer models.
          
            * `transformers`: Hugging Face's library for working with transformer models

_posts/2025-01-30-deepseek-integration-rag.md


		In this step, you'll initialize two key components:

		- The SentenceTransformer model for creating document embeddings

Collaborator

natebower Jan 30, 2025

Suggested change

      
            - The SentenceTransformer model for creating document embeddings
          
            - The SentenceTransformer model for creating document embeddings.

_posts/2025-01-30-deepseek-integration-rag.md

+              In this step, you'll initialize two key components:
+              - The SentenceTransformer model for creating document embeddings
+              - The DeepSeek model for generating responses. The MiniLM-L6-v2 model provides a good balance between performance and accuracy for embeddings.

Collaborator

natebower Jan 30, 2025

Suggested change

      
            - The DeepSeek model for generating responses. The MiniLM-L6-v2 model provides a good balance between performance and accuracy for embeddings.
          
            - The DeepSeek model for generating responses. The MiniLM-L6-v2 model provides a good balance between embedding performance and accuracy.

_posts/2025-01-30-deepseek-integration-rag.md

+              The RAG pipeline consists of two main functions:
+              * `index_document`: Converts text into embeddings and stores them in OpenSearch.
+              * `query_documents`: Performs similarity search using k-NN to find relevant documents. In this example, k-NN search returns the 3 most similar documents; you can adjust this number based on your needs.

Collaborator

natebower Jan 30, 2025

Suggested change

      
            * `query_documents`: Performs similarity search using k-NN to find relevant documents. In this example, k-NN search returns the 3 most similar documents; you can adjust this number based on your needs.
          
            * `query_documents`: Performs similarity search using k-NN to find relevant documents. In this example, k-NN search returns the three most similar documents; you can adjust this number based on your needs.

_posts/2025-01-30-deepseek-integration-rag.md


		## Try it out

		The following complete one-click script contains all preceding steps. Save this script as `deepseek_rag.py`:

Collaborator

natebower Jan 30, 2025

Suggested change

      
            The following complete one-click script contains all preceding steps. Save this script as `deepseek_rag.py`:
          
            The following complete one-click script contains all of the preceding steps. Save this script as `deepseek_rag.py`:

_posts/2025-01-30-deepseek-integration-rag.md


		This simple RAG implementation combines the power of OpenSearch's vector search capabilities with DeepSeek's advanced language understanding. While this is a basic setup, it provides a foundation that you can build upon for more complex applications.

		Key benefits of this implementation:

Collaborator

natebower Jan 30, 2025

Suggested change

      
            Key benefits of this implementation:
          
            The following are some key benefits of this implementation:

Collaborator

natebower commented Jan 30, 2025

@nateynateynate This should not be merged until @kolchfa-aws addresses editorial review. Thanks!

Member

nateynateynate commented Jan 30, 2025

Shit. Sorry, I'll revert.

nateynateynate mentioned this pull request

Revert "Add DeepSeek integration blog" #3596

Merged

Member

nateynateynate commented Jan 30, 2025

@kolchfa-aws - My apologies. You might have to file another PR to get this completed. I can't seem to find a button to reopen it.

kolchfa-aws mentioned this pull request

Add DeepSeek integration blog #3597

Merged

1 task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

natebower natebower left review comments

nateynateynate nateynateynate approved these changes

elfisher Awaiting requested review from elfisher elfisher is a code owner

AMoo-Miki Awaiting requested review from AMoo-Miki AMoo-Miki is a code owner

nknize Awaiting requested review from nknize nknize is a code owner

krisfreedain Awaiting requested review from krisfreedain krisfreedain is a code owner

peterzhuamazon Awaiting requested review from peterzhuamazon peterzhuamazon is a code owner

CEHENKLE Awaiting requested review from CEHENKLE CEHENKLE is a code owner

dtaivpp Awaiting requested review from dtaivpp dtaivpp is a code owner

Labels

None yet