Skip to content

Conversation

@ShuhaoZhangTony
Copy link
Member

No description provided.

ShuhaoZhangTony and others added 4 commits November 21, 2025 20:58
…llection

- Add complete SimpleGraphIndex class (241 lines) with adjacency list-based graph storage
- Add complete GraphMemoryCollection class (268 lines) extending BaseMemoryCollection
- Implement graph operations: add/remove nodes/edges, get neighbors, BFS traversal
- Add JSON-based persistence (store/load) for graph indexes
- Support weighted directed edges with bidirectional tracking
- Export SimpleGraphIndex in memory_collection/__init__.py

This implementation resolves the missing SimpleGraphIndex import error that was
causing CI failures in the main SAGE repository.

Fixes: ImportError when importing sage.middleware.components.sage_mem
* collection update

* Initial plan

* Apply code review suggestions to vdb_collection.py

Co-authored-by: ShuhaoZhangTony <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ShuhaoZhangTony <[email protected]>
Co-authored-by: Shuhao Zhang (Tony) <[email protected]>
* feat(vdb_collection): Add comprehensive memory statistics tracking

Implement statistics tracking for VDBMemoryCollection:
- Track insert, retrieve, index creation, and rebuild operations
- Record retrieval performance metrics (duration, result count)
- Calculate memory usage estimates based on vector dimensions
- Add query APIs: get_statistics(), get_memory_stats(), get_retrieve_stats(), get_index_rebuild_stats()
- Add reset_statistics() to clear counters
- Persist statistics in config.json with backwards compatibility
- Minimal performance impact (~5 ops per insert, ~6 per retrieve)

Statistics tracked:
- insert_count, retrieve_count, index_create_count, index_rebuild_count
- total_vectors_stored across all indexes
- Per-retrieval metrics with timestamps
- Per-index statistics including vector counts and timestamps

Related to SAGE PR #1145 and Issue #223

* fix: place statistics methods inside VDBMemoryCollection class

- Move all statistics methods from outside class to inside class
- Methods were incorrectly placed after run_test() function
- Fixes AttributeError when trying to access statistics methods

* feat: add CI/CD configuration and pre-commit hooks

- Add GitHub Actions workflow (.github/workflows/test.yml)
  * Lint job: ruff check and format validation
  * Validate job: pyproject.toml and syntax checks
  * Build job: package build validation with twine
- Add pre-commit hooks configuration (.pre-commit-config.yaml)
  * File checks (trailing whitespace, EOF, YAML/JSON/TOML validation)
  * Ruff linter and formatter (auto-fix on commit)
  * Python syntax check
- Update pyproject.toml with ruff configuration
  * Line length: 100
  * Modern Python 3.10+ rules
  * Import sorting, code style enforcement
- Add CI_CD_README.md documentation
- Auto-fix 44 code quality issues with ruff
- Format code with ruff format (9 files reformatted)

This enables lightweight CI/CD for neuromem repository while
comprehensive integration tests remain in SAGE main repo.

* docs: add contributor guide for CI/CD setup

Add CONTRIBUTING.md with:
- Quick start guide for first-time setup
- Daily development workflow
- Handling pre-commit hook failures
- Troubleshooting common issues
- Reference to detailed CI_CD_README.md

* style: fix EOF in README.md

* fix: apply ruff code quality fixes and enable unsafe-fixes in pre-commit

- Fix B904: Add 'from e' to exception chains
- Fix SIM108: Use ternary operator instead of if-else blocks
- Fix SIM118: Remove unnecessary .keys() in dict iteration
- Fix UP035: Replace deprecated typing imports (Dict, List, Type) with built-in types
- Enable --unsafe-fixes in .pre-commit-config.yaml for automatic fixes

* feat: add automatic pre-commit hooks installation in setup.py

- Add PostDevelopCommand and PostInstallCommand to automatically install pre-commit hooks
- Detect both standalone and submodule git repositories (.git file or directory)
- Update dev dependencies to include ruff and pre-commit
- Silently skip if pre-commit is not installed or not in a git repo

This makes pre-commit hooks installation automatic when installing neuromem,
reducing manual setup steps for developers.

* docs: update CONTRIBUTING with automatic pre-commit hooks setup

- Add Option 1: Automatic setup via SAGE quickstart.sh
- Add Option 2: Manual standalone setup
- Clarify that pip install -e .[dev] auto-installs hooks
- Emphasize recommended approach for SAGE developers

* fix: pin ruff version to 0.8.4 for consistency across all environments

PROBLEM:
- Local pre-commit hooks used ruff v0.8.4
- CI/CD used latest ruff (variable version)
- This caused inconsistent linting results between local and CI

SOLUTION:
- Pin ruff==0.8.4 in .github/workflows/test.yml
- Pin ruff==0.8.4 in setup.py dev dependencies
- Document version pinning strategy in CI_CD_README.md

Now pre-commit and CI/CD use identical ruff version, ensuring:
✅ Same linting rules
✅ Same auto-fixes
✅ No surprises in CI after local checks pass

* fix: upgrade ruff to 0.14.2 to match SAGE main repository

PROBLEM:
- neuromem used ruff 0.8.4
- SAGE main repo uses ruff 0.14.2
- Version mismatch could cause conflicts and inconsistent linting

SOLUTION:
- Upgrade to ruff==0.14.2 in all configurations:
  * .pre-commit-config.yaml: v0.14.2
  * .github/workflows/test.yml: ruff==0.14.2
  * setup.py: ruff==0.14.2
- Update CI_CD_README.md to document alignment with SAGE

BENEFITS:
✅ Consistent with SAGE main repository
✅ Latest ruff features and bug fixes
✅ No version conflicts when developing within SAGE
✅ Future-proof (using actively maintained version)

* feat: add unit tests for neuromem

ADDED:
- tests/test_neuromem_standalone.py: Standalone smoke tests for neuromem package
- tests/test_vdb_statistics.py: Comprehensive VDB statistics tests (moved from SAGE)
- Updated CI workflow to run standalone tests

DESIGN:
- test_neuromem_standalone.py: Simple tests that work without SAGE dependencies
- test_vdb_statistics.py: Full tests that require SAGE environment (run in SAGE CI)
- Dual-environment approach: neuromem CI runs standalone, SAGE CI runs comprehensive

BENEFITS:
✅ Neuromem has its own basic test suite
✅ Avoid test duplication - statistics tests moved from SAGE to neuromem
✅ SAGE CI will run comprehensive tests with full dependencies
✅ Neuromem CI validates standalone functionality

* fix: replace absolute SAGE import with relative import in graph_collection

PROBLEM:
- graph_collection.py used absolute import:
  from sage.middleware.components.sage_mem.neuromem.utils.path_utils import ...
- This breaks neuromem when used as standalone package (ModuleNotFoundError: No module named 'sage')
- CI tests failed because neuromem couldn't import its own modules

SOLUTION:
- Replace with relative import: from ..utils.path_utils import get_default_data_dir
- Fix import ordering with ruff 0.14.2

VERIFICATION:
✅ Local pre-commit hooks pass
✅ ruff check passes
✅ Works in both SAGE and standalone contexts

* fix(ci): 修复本地与CI环境不一致的代码质量检查问题

- 使用 pre-commit 替代直接运行 ruff,确保本地和CI行为一致
- 添加 ruff.lint.isort 配置,明确 import 排序规则
- 重写 test_neuromem_standalone.py,移除对 SAGE 依赖的导入
- 添加详细的 CI 失败修复指南

修复问题:
1. Import 排序在不同环境下结果不一致
2. 独立测试因导入 SAGE 模块而失败

参考 SAGE 主仓库的 CI/CD 最佳实践

* style: remove blank line after import pytest in test_vdb_statistics

Fix CI ruff format check failure

* fix(ci): run standalone tests without installing package

Avoid importing __init__.py which contains SAGE dependencies.
Install only pytest, not the full neuromem package with dev dependencies.

* Initial plan

* fix(ci): resolve ruff check and unit test failures

- Remove blank lines between standard and third-party imports in
  kv_collection.py, vdb_collection.py, faiss_index.py
- Update standalone tests to run from /tmp to avoid importing package __init__.py
- Remove tests/__init__.py to prevent pytest from treating tests as a subpackage
- Add pytest configuration to pyproject.toml

Co-authored-by: ShuhaoZhangTony <[email protected]>

* refactor: address code review feedback in test file

- Add return type annotation to get_neuromem_root()
- Improve docstring with detailed path resolution explanation
- Extract duplicate lists to module-level constants

Co-authored-by: ShuhaoZhangTony <[email protected]>

* Rebase memory statistics feature to main-dev external embedder pattern (#7)

* Initial plan

* Rebase vdb_collection.py to use main-dev's external embedder approach with statistics tracking

Co-authored-by: ShuhaoZhangTony <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ShuhaoZhangTony <[email protected]>

* fix: apply ruff formatting to resolve CI code quality failures (#9)

* Initial plan

* fix: apply ruff formatting to vdb_collection.py

Co-authored-by: ShuhaoZhangTony <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ShuhaoZhangTony <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
…on.py (#10)

* feat(vdb_collection): Add comprehensive memory statistics tracking

Implement statistics tracking for VDBMemoryCollection:
- Track insert, retrieve, index creation, and rebuild operations
- Record retrieval performance metrics (duration, result count)
- Calculate memory usage estimates based on vector dimensions
- Add query APIs: get_statistics(), get_memory_stats(), get_retrieve_stats(), get_index_rebuild_stats()
- Add reset_statistics() to clear counters
- Persist statistics in config.json with backwards compatibility
- Minimal performance impact (~5 ops per insert, ~6 per retrieve)

Statistics tracked:
- insert_count, retrieve_count, index_create_count, index_rebuild_count
- total_vectors_stored across all indexes
- Per-retrieval metrics with timestamps
- Per-index statistics including vector counts and timestamps

Related to SAGE PR #1145 and Issue #223

* fix: place statistics methods inside VDBMemoryCollection class

- Move all statistics methods from outside class to inside class
- Methods were incorrectly placed after run_test() function
- Fixes AttributeError when trying to access statistics methods

* feat: add CI/CD configuration and pre-commit hooks

- Add GitHub Actions workflow (.github/workflows/test.yml)
  * Lint job: ruff check and format validation
  * Validate job: pyproject.toml and syntax checks
  * Build job: package build validation with twine
- Add pre-commit hooks configuration (.pre-commit-config.yaml)
  * File checks (trailing whitespace, EOF, YAML/JSON/TOML validation)
  * Ruff linter and formatter (auto-fix on commit)
  * Python syntax check
- Update pyproject.toml with ruff configuration
  * Line length: 100
  * Modern Python 3.10+ rules
  * Import sorting, code style enforcement
- Add CI_CD_README.md documentation
- Auto-fix 44 code quality issues with ruff
- Format code with ruff format (9 files reformatted)

This enables lightweight CI/CD for neuromem repository while
comprehensive integration tests remain in SAGE main repo.

* docs: add contributor guide for CI/CD setup

Add CONTRIBUTING.md with:
- Quick start guide for first-time setup
- Daily development workflow
- Handling pre-commit hook failures
- Troubleshooting common issues
- Reference to detailed CI_CD_README.md

* style: fix EOF in README.md

* fix: apply ruff code quality fixes and enable unsafe-fixes in pre-commit

- Fix B904: Add 'from e' to exception chains
- Fix SIM108: Use ternary operator instead of if-else blocks
- Fix SIM118: Remove unnecessary .keys() in dict iteration
- Fix UP035: Replace deprecated typing imports (Dict, List, Type) with built-in types
- Enable --unsafe-fixes in .pre-commit-config.yaml for automatic fixes

* feat: add automatic pre-commit hooks installation in setup.py

- Add PostDevelopCommand and PostInstallCommand to automatically install pre-commit hooks
- Detect both standalone and submodule git repositories (.git file or directory)
- Update dev dependencies to include ruff and pre-commit
- Silently skip if pre-commit is not installed or not in a git repo

This makes pre-commit hooks installation automatic when installing neuromem,
reducing manual setup steps for developers.

* docs: update CONTRIBUTING with automatic pre-commit hooks setup

- Add Option 1: Automatic setup via SAGE quickstart.sh
- Add Option 2: Manual standalone setup
- Clarify that pip install -e .[dev] auto-installs hooks
- Emphasize recommended approach for SAGE developers

* fix: pin ruff version to 0.8.4 for consistency across all environments

PROBLEM:
- Local pre-commit hooks used ruff v0.8.4
- CI/CD used latest ruff (variable version)
- This caused inconsistent linting results between local and CI

SOLUTION:
- Pin ruff==0.8.4 in .github/workflows/test.yml
- Pin ruff==0.8.4 in setup.py dev dependencies
- Document version pinning strategy in CI_CD_README.md

Now pre-commit and CI/CD use identical ruff version, ensuring:
✅ Same linting rules
✅ Same auto-fixes
✅ No surprises in CI after local checks pass

* fix: upgrade ruff to 0.14.2 to match SAGE main repository

PROBLEM:
- neuromem used ruff 0.8.4
- SAGE main repo uses ruff 0.14.2
- Version mismatch could cause conflicts and inconsistent linting

SOLUTION:
- Upgrade to ruff==0.14.2 in all configurations:
  * .pre-commit-config.yaml: v0.14.2
  * .github/workflows/test.yml: ruff==0.14.2
  * setup.py: ruff==0.14.2
- Update CI_CD_README.md to document alignment with SAGE

BENEFITS:
✅ Consistent with SAGE main repository
✅ Latest ruff features and bug fixes
✅ No version conflicts when developing within SAGE
✅ Future-proof (using actively maintained version)

* feat: add unit tests for neuromem

ADDED:
- tests/test_neuromem_standalone.py: Standalone smoke tests for neuromem package
- tests/test_vdb_statistics.py: Comprehensive VDB statistics tests (moved from SAGE)
- Updated CI workflow to run standalone tests

DESIGN:
- test_neuromem_standalone.py: Simple tests that work without SAGE dependencies
- test_vdb_statistics.py: Full tests that require SAGE environment (run in SAGE CI)
- Dual-environment approach: neuromem CI runs standalone, SAGE CI runs comprehensive

BENEFITS:
✅ Neuromem has its own basic test suite
✅ Avoid test duplication - statistics tests moved from SAGE to neuromem
✅ SAGE CI will run comprehensive tests with full dependencies
✅ Neuromem CI validates standalone functionality

* fix: replace absolute SAGE import with relative import in graph_collection

PROBLEM:
- graph_collection.py used absolute import:
  from sage.middleware.components.sage_mem.neuromem.utils.path_utils import ...
- This breaks neuromem when used as standalone package (ModuleNotFoundError: No module named 'sage')
- CI tests failed because neuromem couldn't import its own modules

SOLUTION:
- Replace with relative import: from ..utils.path_utils import get_default_data_dir
- Fix import ordering with ruff 0.14.2

VERIFICATION:
✅ Local pre-commit hooks pass
✅ ruff check passes
✅ Works in both SAGE and standalone contexts

* fix(ci): 修复本地与CI环境不一致的代码质量检查问题

- 使用 pre-commit 替代直接运行 ruff,确保本地和CI行为一致
- 添加 ruff.lint.isort 配置,明确 import 排序规则
- 重写 test_neuromem_standalone.py,移除对 SAGE 依赖的导入
- 添加详细的 CI 失败修复指南

修复问题:
1. Import 排序在不同环境下结果不一致
2. 独立测试因导入 SAGE 模块而失败

参考 SAGE 主仓库的 CI/CD 最佳实践

* style: remove blank line after import pytest in test_vdb_statistics

Fix CI ruff format check failure

* fix(ci): run standalone tests without installing package

Avoid importing __init__.py which contains SAGE dependencies.
Install only pytest, not the full neuromem package with dev dependencies.

* Initial plan

* fix(ci): resolve ruff check and unit test failures

- Remove blank lines between standard and third-party imports in
  kv_collection.py, vdb_collection.py, faiss_index.py
- Update standalone tests to run from /tmp to avoid importing package __init__.py
- Remove tests/__init__.py to prevent pytest from treating tests as a subpackage
- Add pytest configuration to pyproject.toml

Co-authored-by: ShuhaoZhangTony <[email protected]>

* refactor: address code review feedback in test file

- Add return type annotation to get_neuromem_root()
- Improve docstring with detailed path resolution explanation
- Extract duplicate lists to module-level constants

Co-authored-by: ShuhaoZhangTony <[email protected]>

* Rebase memory statistics feature to main-dev external embedder pattern (#7)

* Initial plan

* Rebase vdb_collection.py to use main-dev's external embedder approach with statistics tracking

Co-authored-by: ShuhaoZhangTony <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ShuhaoZhangTony <[email protected]>

* fix: apply ruff formatting to resolve CI code quality failures (#9)

* Initial plan

* fix: apply ruff formatting to vdb_collection.py

Co-authored-by: ShuhaoZhangTony <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ShuhaoZhangTony <[email protected]>

* Initial plan

* fix: remove duplicate imports and trailing whitespace in vdb_collection.py

Co-authored-by: ShuhaoZhangTony <[email protected]>

---------

Co-authored-by: zhang shuhao <[email protected]>
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ShuhaoZhangTony <[email protected]>
@ShuhaoZhangTony
Copy link
Member Author

@copilot Run pre-commit run --all-files --show-diff-on-failure || {
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Initializing environment for https://github.com/astral-sh/ruff-pre-commit.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/astral-sh/ruff-pre-commit.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...............................................................Passed
check json...........................................(no files to check)Skipped
check toml...............................................................Passed
check for added large files..............................................Passed
check for merge conflicts................................................Passed
mixed line ending........................................................Passed
detect private key.......................................................Passed
ruff check...............................................................Failed

  • hook id: ruff
  • exit code: 1
  • files were modified by this hook

Found 1 error (1 fixed, 0 remaining).

ruff format..............................................................Failed

  • hook id: ruff-format
  • files were modified by this hook

1 file reformatted, 28 files left unchanged

Check Python syntax......................................................Passed
pre-commit hook(s) made changes.
If you are seeing this message in CI, reproduce locally with: pre-commit run --all-files.
To run pre-commit as part of git workflow, use pre-commit install.
All changes made by hooks:
diff --git a/memory_collection/vdb_collection.py b/memory_collection/vdb_collection.py
index f96da36..9584962 100644
--- a/memory_collection/vdb_collection.py
+++ b/memory_collection/vdb_collection.py
@@ -883,8 +883,6 @@ if name == "main":

 from sage.common.components.sage_embedding.embedding_api import apply_embedding_model
  • from sage.common.components.sage_embedding.embedding_api import apply_embedding_model
  • def colored(text, color):
    colors = {
    "green": "\033[92m",

❌ 代码质量检查失败!

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 本地复现和修复方法:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1️⃣ 安装 pre-commit(一次性):
pip install pre-commit
pre-commit install # 安装 git hooks

2️⃣ 运行检查(与CI完全一致):
pre-commit run --all-files

3️⃣ 自动修复大多数问题:
pre-commit run --all-files # 会自动修复格式问题

4️⃣ 手动修复特定问题:
ruff check --fix . # 修复linting问题
ruff format . # 格式化代码

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📝 注意:安装 pre-commit hooks 后,每次提交时会自动检查
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Error: Process completed with exit code 1.

Copy link
Contributor

Copilot AI commented Nov 26, 2025

@ShuhaoZhangTony I've opened a new pull request, #11, to work on those changes. Once the pull request is ready, I'll request review from you.

KimmoZAG and others added 3 commits November 26, 2025 18:11
* feat(vdb): improve threshold handling and index configuration

- Fix nested config merging in FaissIndex initialization
- Make threshold optional (None = no filtering, only topk)
- Update threshold documentation for different index types
- Change warning to info level for non-critical logs
- Fix LSH/Hamming distance threshold filtering logic

* fix: remove trailing whitespace and duplicate import
* Initial plan

* fix: remove duplicate import in vdb_collection.py to pass pre-commit checks

Co-authored-by: ShuhaoZhangTony <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: ShuhaoZhangTony <[email protected]>
- Add proper type annotations to avoid None attribute errors
- Fix vector processing with proper type guards
- Use typed intermediate variables (processed_vector, processed_query, proc_vec)
- Fix .get() chain calls to use direct dict access
- Add None checks for config parameter
- Fix test code to handle None return values
@ShuhaoZhangTony ShuhaoZhangTony merged commit 120a105 into main Nov 29, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants