Stream document to db as embedding is occurring to reduce memory consumption #214

MarcAmick · 2025-10-15T17:54:09Z

Currently RAG creates embeddings for the file holding them all in memory then bulk inserts them all into the vectordb. When embedding very large files this consumes a vast amount of memory. When memory limits are set in an AKS/EKS environment the pod is more likely to hit the memory limit causing it to crash and restart. This is largely solved by changing the logic such that it breaks the file up into chucks and embeds each separately, and asynchronously bulk inserting each chunk individually as the embedding process completes each chunk. This way less memory is consumed because the process is clearing memory of each chunk of embedding once completed. It also may slightly increase the speed because the database has already inserted most of the document by the time the last bulk insert is run so the last bulk insert is much smaller and completes quickly. If any of the chunks result in some sort of error, the entire document is removed from the db so there is no chance of a partial file in the vectordb.

…ration and DB insertion. This provides better memory efficiency by immediately inserting embeddings as they're generated.

…E_SIZE env var

DocX · 2025-11-26T15:53:39Z

app/config.py

+# hint: 750 seems to be a good balance between speed and memory usage for text-embedding-3-small embeddings
+EMBEDDING_BATCH_SIZE = int(get_env_variable("EMBEDDING_BATCH_SIZE", "0"))
+# when EMBEDDING_BATCH_SIZE is set, this controls the max size the queue of batches to process can contain at any one time
+EMBEDIING_MAX_QUEUE_SIZE = int(get_env_variable("EMBEDIING_MAX_QUEUE_SIZE", "3"))


danny-avila · 2025-11-26T16:12:56Z

Thanks for the PR, will review soon

Copilot

Pull request overview

This PR implements a streaming approach to document embedding and vector database insertion to reduce memory consumption when processing large files. Instead of embedding entire documents in memory before inserting them, the code now processes documents in configurable batches, inserting each batch as embeddings are generated.

Key changes:

Introduces configurable batch processing via EMBEDDING_BATCH_SIZE environment variable
Implements async producer-consumer pipeline for concurrent embedding generation and database insertion
Adds rollback mechanisms to ensure no partial documents remain in the database on failure

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 18 comments.

File	Description
app/config.py	Adds new configuration variables for embedding batch size and queue size to control memory usage
app/routes/document_routes.py	Implements async pipeline with producer-consumer pattern for streaming embeddings, adds batched sync fallback, and updates main insertion logic to use new streaming approach when batch size is configured

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Copilot · 2025-11-26T16:18:22Z

app/routes/document_routes.py

+                logger.error(f"Batch {batch_idx + 1} failed: {batch_error}")
+
+                # Rollback entire file from vector store
+                if all_ids:  # If we have some successful inserts


[nitpick] The rollback logic checks for all_ids to determine if any successful inserts occurred, but then attempts to delete by file_id. If individual batch insertions succeeded, all_ids will contain multiple IDs for the same file_id. The deletion should work correctly since it deletes by file_id, but the comment "If we have some successful inserts" is misleading - it should check if any batches succeeded, not just if all_ids is non-empty. Consider clarifying this logic or comment.

Suggested change

if all_ids: # If we have some successful inserts

if all_ids: # If any batch succeeded (i.e., any chunks for this file were inserted)

Copilot · 2025-11-26T16:18:23Z

app/routes/document_routes.py

+            from app.config import EMBEDDINGS_PROVIDER, init_embeddings, EMBEDDINGS_MODEL
+            embedding = init_embeddings(EMBEDDINGS_PROVIDER, EMBEDDINGS_MODEL)


The embedding function is initialized inside the conditional block for every document upload when EMBEDDING_BATCH_SIZE > 0. This import and initialization could be expensive and should ideally be done once at module level or cached. Consider moving the import to the top of the file and potentially caching the embedding function initialization.

Copilot · 2025-11-26T16:18:23Z

app/routes/document_routes.py

+    temp_vector_store = AsyncPgVector(
+        connection_string=vector_store.connection_string,
+        embedding_function=request_embedding_func,
+        collection_name=vector_store.collection_name,
+    )


[nitpick] Creating a new AsyncPgVector instance for each document upload could be inefficient. This creates a new database connection and configuration for each upload. Consider whether the existing vector_store could be used with a request-specific embedding function wrapper, or if connection pooling is properly configured to handle this pattern efficiently.

Copilot · 2025-11-26T16:18:23Z

app/config.py

+# hint: 750 seems to be a good balance between speed and memory usage for text-embedding-3-small embeddings
+EMBEDDING_BATCH_SIZE = int(get_env_variable("EMBEDDING_BATCH_SIZE", "0"))
+# when EMBEDDING_BATCH_SIZE is set, this controls the max size the queue of batches to process can contain at any one time
+EMBEDIING_MAX_QUEUE_SIZE = int(get_env_variable("EMBEDIING_MAX_QUEUE_SIZE", "3"))


Typo in variable name: EMBEDIING_MAX_QUEUE_SIZE should be EMBEDDING_MAX_QUEUE_SIZE (missing 'D' in 'EMBEDDING'). This constant name should match the corrected import.

Suggested change

EMBEDIING_MAX_QUEUE_SIZE = int(get_env_variable("EMBEDIING_MAX_QUEUE_SIZE", "3"))

EMBEDDING_MAX_QUEUE_SIZE = int(get_env_variable("EMBEDDING_MAX_QUEUE_SIZE", "3"))

Copilot · 2025-11-26T16:18:23Z

app/routes/document_routes.py

        else:
-            ids = vector_store.add_documents(docs, ids=[file_id] * len(documents))
+            # asynchronously embed the file and insert into vector store as it is embedding
+            # to lesson memory impact and speed up slightly as the majority of the document


Typo: "lesson" should be "lessen" (meaning to reduce or decrease).

Suggested change

# to lesson memory impact and speed up slightly as the majority of the document

# to lessen memory impact and speed up slightly as the majority of the document

Copilot · 2025-11-26T16:18:25Z

app/routes/document_routes.py

+        vector_store
+) -> List[str]:
+    """
+    Fallback batched processing for sync vector stores.


[nitpick] The function docstring is incomplete. It should document the parameters (documents, file_id, request_embedding_func, vector_store) and return value (List[str]). Consider adding a complete docstring with parameter descriptions, return value, and potential exceptions raised.

Suggested change

Fallback batched processing for sync vector stores.

Processes documents in batches and adds them to a synchronous vector store.

Args:

documents (List[Document]): The list of Document objects to process and add.

file_id (str): The identifier for the file associated with these documents.

request_embedding_func: The embedding function to use for processing documents.

vector_store: The synchronous vector store instance to which documents are added.

Returns:

List[str]: A list of IDs corresponding to the successfully added documents.

Raises:

Exception: If any batch fails to process, the exception is raised after attempting rollback.

Copilot · 2025-11-26T16:18:26Z

app/routes/document_routes.py

+            # Wait for both to complete
+            # return_exceptions = false will force catching the exception below in this try
+            results = await asyncio.gather(producer_task, consumer_task, return_exceptions=False)
+
+            # Get final results
+            batch_results = await results_queue.get()


The comment says "return_exceptions = false will force catching the exception below" but asyncio.gather() is called with return_exceptions=False. When return_exceptions=False, exceptions are propagated immediately and will be caught by the outer try-except block. However, if an exception occurs, the code on line 446 (batch_results = await results_queue.get()) may never execute because the exception would have already been raised. Consider restructuring this logic to ensure proper error handling flow.

Copilot · 2025-11-26T16:18:26Z

app/routes/document_routes.py

+
+    finally:
+        # Cleanup
+        del temp_vector_store, request_embedding_func


[nitpick] The explicit del statement is unnecessary. These objects will be automatically garbage collected when they go out of scope. This explicit deletion provides no benefit and can be removed for cleaner code.

Suggested change

del temp_vector_store, request_embedding_func

Copilot · 2025-11-26T16:18:26Z

app/routes/document_routes.py

+            if not consumer_task.done():
+                consumer_task.cancel()
+            if not producer_task.done():
+                producer_task.cancel()


After cancelling tasks, you should await them to ensure proper cleanup. Task cancellation in asyncio requires awaiting the cancelled task to handle the CancelledError exception. Add await asyncio.gather(consumer_task, producer_task, return_exceptions=True) after the cancellation to ensure proper cleanup.

Suggested change

producer_task.cancel()

producer_task.cancel()

# Await cancelled tasks to ensure proper cleanup

await asyncio.gather(consumer_task, producer_task, return_exceptions=True)

Copilot · 2025-11-26T16:18:26Z

app/routes/document_routes.py

+        try:
+            # Wait for both to complete
+            # return_exceptions = false will force catching the exception below in this try
+            results = await asyncio.gather(producer_task, consumer_task, return_exceptions=False)


Variable results is not used.

Suggested change

results = await asyncio.gather(producer_task, consumer_task, return_exceptions=False)

await asyncio.gather(producer_task, consumer_task, return_exceptions=False)

danny-avila · 2025-11-26T16:20:30Z

@MarcAmick please fix the typos and address each Copilot comment

Process documents using async pipeline with concurrent embedding gene…

4fee3db

…ration and DB insertion. This provides better memory efficiency by immediately inserting embeddings as they're generated.

MarcAmick mentioned this pull request Oct 15, 2025

[Enhancement] Reduce memory consumption greatly, and speed up process slighlty by processing file async and inserting to vectordb in chunks rather than all at once #213

Open

- mad maxsize of embedding-queue configurable with EMBEDIING_MAX_QUEU…

0897f5e

…E_SIZE env var

MarcAmick changed the title ~~Stream document to db as document as embedding is occurring to reduce memory consumption~~ Stream document to db as embedding is occurring to reduce memory consumption Oct 21, 2025

DocX reviewed Nov 26, 2025

View reviewed changes

danny-avila requested a review from Copilot November 26, 2025 16:12

Copilot started reviewing on behalf of danny-avila November 26, 2025 16:13 View session

Copilot finished reviewing on behalf of danny-avila November 26, 2025 16:15

Copilot AI reviewed Nov 26, 2025

View reviewed changes

danny-avila mentioned this pull request Nov 26, 2025

✨ fix: Add EMBEDDING_BATCH_SIZE configuration and implement batch pro… #195

Closed

	if all_ids: # If we have some successful inserts
	if all_ids: # If any batch succeeded (i.e., any chunks for this file were inserted)

		from app.config import EMBEDDINGS_PROVIDER, init_embeddings, EMBEDDINGS_MODEL
		embedding = init_embeddings(EMBEDDINGS_PROVIDER, EMBEDDINGS_MODEL)

	EMBEDIING_MAX_QUEUE_SIZE = int(get_env_variable("EMBEDIING_MAX_QUEUE_SIZE", "3"))
	EMBEDDING_MAX_QUEUE_SIZE = int(get_env_variable("EMBEDDING_MAX_QUEUE_SIZE", "3"))

	# to lesson memory impact and speed up slightly as the majority of the document
	# to lessen memory impact and speed up slightly as the majority of the document

-    Fallback batched processing for sync vector stores.
+    Processes documents in batches and adds them to a synchronous vector store.
+    Args:
+        documents (List[Document]): The list of Document objects to process and add.
+        file_id (str): The identifier for the file associated with these documents.
+        request_embedding_func: The embedding function to use for processing documents.
+        vector_store: The synchronous vector store instance to which documents are added.
+    Returns:
+        List[str]: A list of IDs corresponding to the successfully added documents.
+    Raises:
+        Exception: If any batch fails to process, the exception is raised after attempting rollback.

	results = await asyncio.gather(producer_task, consumer_task, return_exceptions=False)
	await asyncio.gather(producer_task, consumer_task, return_exceptions=False)

Stream document to db as embedding is occurring to reduce memory consumption #214

Are you sure you want to change the base?

Stream document to db as embedding is occurring to reduce memory consumption #214

Uh oh!

Conversation

MarcAmick commented Oct 15, 2025

Uh oh!

DocX Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

danny-avila commented Nov 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

danny-avila commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants