[Enhancement] Reduce memory consumption greatly, and speed up process slighlty by processing file async and inserting to vectordb in chunks rather than all at once

Currently RAG creates embeddings for the file holding them all in memory then bulk inserts them all into the vectordb.   When embedding very large files this consumes a vast amount of memory.   When memory limits are set in an AKS/EKS environment  the pod is more likely to hit the memory limit causing it to crash and restart.   This is largely solved by changing the logic such that it breaks the file up into chucks and embeds each separately, and asynchronously bulk inserting each chunk individually as the embedding process completes each chunk.  This way less memory is consumed because the process is clearing memory of each chunk of embedding once completed.  It also may slightly increase the speed because the database has already inserted most of the document by the time the last bulk insert is run so the last bulk insert is much smaller and completes quickly.  If any of the chunks result in some sort of error, the entire document is removed from the db so there is no chance of a partial file in the vectordb.

Please see pull request to address this enhancement:  https://github.com/danny-avila/rag_api/pull/214



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Reduce memory consumption greatly, and speed up process slighlty by processing file async and inserting to vectordb in chunks rather than all at once #213

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Enhancement] Reduce memory consumption greatly, and speed up process slighlty by processing file async and inserting to vectordb in chunks rather than all at once #213

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions