Skip to content

[Performance] Refactor HSG waypoint creation to use vector store ANN search. #141

@rileydes-improving

Description

@rileydes-improving

Problem
The current implementation of Hierarchical Sectored Graph (HSG) maintenance tasks, specifically the waypoint creation, relies on loading large datasets into memory and performing O(N) cosine similarity calculations in JavaScript/python. This bypasses the optimized (ANN/KNN/etc) search capabilities in the underlying vector stores (pgvector or Valkey).

Code Citations

  1. Single Waypoint Creation
    https://github.com/CaviraOSS/OpenMemory/blob/main/packages/openmemory-js/src/memory/hsg.ts#L502-L533
    https://github.com/CaviraOSS/OpenMemory/blob/main/packages/openmemory-py/src/openmemory/memory/hsg.py#L266-L286
    This function fetches 1,000 rows into memory and iterates through them on the CPU to find a single nearest neighbour.

  2. Inter-Memory Waypoint Creation
    https://github.com/CaviraOSS/OpenMemory/blob/main/packages/openmemory-js/src/memory/hsg.ts#L534-L567
    https://github.com/CaviraOSS/OpenMemory/blob/main/packages/openmemory-py/src/openmemory/memory/hsg.py#L299-L317
    This function fetches all vectors for a given sector (which could be thousands or millions) and performs a linear scan.

Performance Impact:

  • Network overhead: Transfers all vectors
  • CPU overhead: JavaScript or python cosine similarity (single-threaded)
  • Memory pressure: Loading thousands (or millions) of vectors into the heap for every memory insert
  • Latency: Waypoint generation time grows linearly with database size due to scanning linearly

Proposed Solution:
Refactor both packages to use vector store ANN search capabilities.

JS Refactor
Refactor hsg.ts to utilize the existing VectorStore.searchSimilar() interface. This delegates the search to the vector store, using native indexing.

Python Refactor (HSG Logic)
Update the hsg.py to use store.search() instead of manual NumPy loops, mirroring the JS Refactor.
You would also need to rewrite ValkeyVectorStore.search in valkey.py to use RediSearch commands (FT.SEARCH) instead of SCAN. Will need to ensure that the index exists in __ init __ or _ get_client.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions