-
Notifications
You must be signed in to change notification settings - Fork 0
Main dev #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Main dev #6
Conversation
…llection - Add complete SimpleGraphIndex class (241 lines) with adjacency list-based graph storage - Add complete GraphMemoryCollection class (268 lines) extending BaseMemoryCollection - Implement graph operations: add/remove nodes/edges, get neighbors, BFS traversal - Add JSON-based persistence (store/load) for graph indexes - Support weighted directed edges with bidirectional tracking - Export SimpleGraphIndex in memory_collection/__init__.py This implementation resolves the missing SimpleGraphIndex import error that was causing CI failures in the main SAGE repository. Fixes: ImportError when importing sage.middleware.components.sage_mem
* collection update * Initial plan * Apply code review suggestions to vdb_collection.py Co-authored-by: ShuhaoZhangTony <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: ShuhaoZhangTony <[email protected]> Co-authored-by: Shuhao Zhang (Tony) <[email protected]>
* feat(vdb_collection): Add comprehensive memory statistics tracking Implement statistics tracking for VDBMemoryCollection: - Track insert, retrieve, index creation, and rebuild operations - Record retrieval performance metrics (duration, result count) - Calculate memory usage estimates based on vector dimensions - Add query APIs: get_statistics(), get_memory_stats(), get_retrieve_stats(), get_index_rebuild_stats() - Add reset_statistics() to clear counters - Persist statistics in config.json with backwards compatibility - Minimal performance impact (~5 ops per insert, ~6 per retrieve) Statistics tracked: - insert_count, retrieve_count, index_create_count, index_rebuild_count - total_vectors_stored across all indexes - Per-retrieval metrics with timestamps - Per-index statistics including vector counts and timestamps Related to SAGE PR #1145 and Issue #223 * fix: place statistics methods inside VDBMemoryCollection class - Move all statistics methods from outside class to inside class - Methods were incorrectly placed after run_test() function - Fixes AttributeError when trying to access statistics methods * feat: add CI/CD configuration and pre-commit hooks - Add GitHub Actions workflow (.github/workflows/test.yml) * Lint job: ruff check and format validation * Validate job: pyproject.toml and syntax checks * Build job: package build validation with twine - Add pre-commit hooks configuration (.pre-commit-config.yaml) * File checks (trailing whitespace, EOF, YAML/JSON/TOML validation) * Ruff linter and formatter (auto-fix on commit) * Python syntax check - Update pyproject.toml with ruff configuration * Line length: 100 * Modern Python 3.10+ rules * Import sorting, code style enforcement - Add CI_CD_README.md documentation - Auto-fix 44 code quality issues with ruff - Format code with ruff format (9 files reformatted) This enables lightweight CI/CD for neuromem repository while comprehensive integration tests remain in SAGE main repo. * docs: add contributor guide for CI/CD setup Add CONTRIBUTING.md with: - Quick start guide for first-time setup - Daily development workflow - Handling pre-commit hook failures - Troubleshooting common issues - Reference to detailed CI_CD_README.md * style: fix EOF in README.md * fix: apply ruff code quality fixes and enable unsafe-fixes in pre-commit - Fix B904: Add 'from e' to exception chains - Fix SIM108: Use ternary operator instead of if-else blocks - Fix SIM118: Remove unnecessary .keys() in dict iteration - Fix UP035: Replace deprecated typing imports (Dict, List, Type) with built-in types - Enable --unsafe-fixes in .pre-commit-config.yaml for automatic fixes * feat: add automatic pre-commit hooks installation in setup.py - Add PostDevelopCommand and PostInstallCommand to automatically install pre-commit hooks - Detect both standalone and submodule git repositories (.git file or directory) - Update dev dependencies to include ruff and pre-commit - Silently skip if pre-commit is not installed or not in a git repo This makes pre-commit hooks installation automatic when installing neuromem, reducing manual setup steps for developers. * docs: update CONTRIBUTING with automatic pre-commit hooks setup - Add Option 1: Automatic setup via SAGE quickstart.sh - Add Option 2: Manual standalone setup - Clarify that pip install -e .[dev] auto-installs hooks - Emphasize recommended approach for SAGE developers * fix: pin ruff version to 0.8.4 for consistency across all environments PROBLEM: - Local pre-commit hooks used ruff v0.8.4 - CI/CD used latest ruff (variable version) - This caused inconsistent linting results between local and CI SOLUTION: - Pin ruff==0.8.4 in .github/workflows/test.yml - Pin ruff==0.8.4 in setup.py dev dependencies - Document version pinning strategy in CI_CD_README.md Now pre-commit and CI/CD use identical ruff version, ensuring: ✅ Same linting rules ✅ Same auto-fixes ✅ No surprises in CI after local checks pass * fix: upgrade ruff to 0.14.2 to match SAGE main repository PROBLEM: - neuromem used ruff 0.8.4 - SAGE main repo uses ruff 0.14.2 - Version mismatch could cause conflicts and inconsistent linting SOLUTION: - Upgrade to ruff==0.14.2 in all configurations: * .pre-commit-config.yaml: v0.14.2 * .github/workflows/test.yml: ruff==0.14.2 * setup.py: ruff==0.14.2 - Update CI_CD_README.md to document alignment with SAGE BENEFITS: ✅ Consistent with SAGE main repository ✅ Latest ruff features and bug fixes ✅ No version conflicts when developing within SAGE ✅ Future-proof (using actively maintained version) * feat: add unit tests for neuromem ADDED: - tests/test_neuromem_standalone.py: Standalone smoke tests for neuromem package - tests/test_vdb_statistics.py: Comprehensive VDB statistics tests (moved from SAGE) - Updated CI workflow to run standalone tests DESIGN: - test_neuromem_standalone.py: Simple tests that work without SAGE dependencies - test_vdb_statistics.py: Full tests that require SAGE environment (run in SAGE CI) - Dual-environment approach: neuromem CI runs standalone, SAGE CI runs comprehensive BENEFITS: ✅ Neuromem has its own basic test suite ✅ Avoid test duplication - statistics tests moved from SAGE to neuromem ✅ SAGE CI will run comprehensive tests with full dependencies ✅ Neuromem CI validates standalone functionality * fix: replace absolute SAGE import with relative import in graph_collection PROBLEM: - graph_collection.py used absolute import: from sage.middleware.components.sage_mem.neuromem.utils.path_utils import ... - This breaks neuromem when used as standalone package (ModuleNotFoundError: No module named 'sage') - CI tests failed because neuromem couldn't import its own modules SOLUTION: - Replace with relative import: from ..utils.path_utils import get_default_data_dir - Fix import ordering with ruff 0.14.2 VERIFICATION: ✅ Local pre-commit hooks pass ✅ ruff check passes ✅ Works in both SAGE and standalone contexts * fix(ci): 修复本地与CI环境不一致的代码质量检查问题 - 使用 pre-commit 替代直接运行 ruff,确保本地和CI行为一致 - 添加 ruff.lint.isort 配置,明确 import 排序规则 - 重写 test_neuromem_standalone.py,移除对 SAGE 依赖的导入 - 添加详细的 CI 失败修复指南 修复问题: 1. Import 排序在不同环境下结果不一致 2. 独立测试因导入 SAGE 模块而失败 参考 SAGE 主仓库的 CI/CD 最佳实践 * style: remove blank line after import pytest in test_vdb_statistics Fix CI ruff format check failure * fix(ci): run standalone tests without installing package Avoid importing __init__.py which contains SAGE dependencies. Install only pytest, not the full neuromem package with dev dependencies. * Initial plan * fix(ci): resolve ruff check and unit test failures - Remove blank lines between standard and third-party imports in kv_collection.py, vdb_collection.py, faiss_index.py - Update standalone tests to run from /tmp to avoid importing package __init__.py - Remove tests/__init__.py to prevent pytest from treating tests as a subpackage - Add pytest configuration to pyproject.toml Co-authored-by: ShuhaoZhangTony <[email protected]> * refactor: address code review feedback in test file - Add return type annotation to get_neuromem_root() - Improve docstring with detailed path resolution explanation - Extract duplicate lists to module-level constants Co-authored-by: ShuhaoZhangTony <[email protected]> * Rebase memory statistics feature to main-dev external embedder pattern (#7) * Initial plan * Rebase vdb_collection.py to use main-dev's external embedder approach with statistics tracking Co-authored-by: ShuhaoZhangTony <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: ShuhaoZhangTony <[email protected]> * fix: apply ruff formatting to resolve CI code quality failures (#9) * Initial plan * fix: apply ruff formatting to vdb_collection.py Co-authored-by: ShuhaoZhangTony <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: ShuhaoZhangTony <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]>
…on.py (#10) * feat(vdb_collection): Add comprehensive memory statistics tracking Implement statistics tracking for VDBMemoryCollection: - Track insert, retrieve, index creation, and rebuild operations - Record retrieval performance metrics (duration, result count) - Calculate memory usage estimates based on vector dimensions - Add query APIs: get_statistics(), get_memory_stats(), get_retrieve_stats(), get_index_rebuild_stats() - Add reset_statistics() to clear counters - Persist statistics in config.json with backwards compatibility - Minimal performance impact (~5 ops per insert, ~6 per retrieve) Statistics tracked: - insert_count, retrieve_count, index_create_count, index_rebuild_count - total_vectors_stored across all indexes - Per-retrieval metrics with timestamps - Per-index statistics including vector counts and timestamps Related to SAGE PR #1145 and Issue #223 * fix: place statistics methods inside VDBMemoryCollection class - Move all statistics methods from outside class to inside class - Methods were incorrectly placed after run_test() function - Fixes AttributeError when trying to access statistics methods * feat: add CI/CD configuration and pre-commit hooks - Add GitHub Actions workflow (.github/workflows/test.yml) * Lint job: ruff check and format validation * Validate job: pyproject.toml and syntax checks * Build job: package build validation with twine - Add pre-commit hooks configuration (.pre-commit-config.yaml) * File checks (trailing whitespace, EOF, YAML/JSON/TOML validation) * Ruff linter and formatter (auto-fix on commit) * Python syntax check - Update pyproject.toml with ruff configuration * Line length: 100 * Modern Python 3.10+ rules * Import sorting, code style enforcement - Add CI_CD_README.md documentation - Auto-fix 44 code quality issues with ruff - Format code with ruff format (9 files reformatted) This enables lightweight CI/CD for neuromem repository while comprehensive integration tests remain in SAGE main repo. * docs: add contributor guide for CI/CD setup Add CONTRIBUTING.md with: - Quick start guide for first-time setup - Daily development workflow - Handling pre-commit hook failures - Troubleshooting common issues - Reference to detailed CI_CD_README.md * style: fix EOF in README.md * fix: apply ruff code quality fixes and enable unsafe-fixes in pre-commit - Fix B904: Add 'from e' to exception chains - Fix SIM108: Use ternary operator instead of if-else blocks - Fix SIM118: Remove unnecessary .keys() in dict iteration - Fix UP035: Replace deprecated typing imports (Dict, List, Type) with built-in types - Enable --unsafe-fixes in .pre-commit-config.yaml for automatic fixes * feat: add automatic pre-commit hooks installation in setup.py - Add PostDevelopCommand and PostInstallCommand to automatically install pre-commit hooks - Detect both standalone and submodule git repositories (.git file or directory) - Update dev dependencies to include ruff and pre-commit - Silently skip if pre-commit is not installed or not in a git repo This makes pre-commit hooks installation automatic when installing neuromem, reducing manual setup steps for developers. * docs: update CONTRIBUTING with automatic pre-commit hooks setup - Add Option 1: Automatic setup via SAGE quickstart.sh - Add Option 2: Manual standalone setup - Clarify that pip install -e .[dev] auto-installs hooks - Emphasize recommended approach for SAGE developers * fix: pin ruff version to 0.8.4 for consistency across all environments PROBLEM: - Local pre-commit hooks used ruff v0.8.4 - CI/CD used latest ruff (variable version) - This caused inconsistent linting results between local and CI SOLUTION: - Pin ruff==0.8.4 in .github/workflows/test.yml - Pin ruff==0.8.4 in setup.py dev dependencies - Document version pinning strategy in CI_CD_README.md Now pre-commit and CI/CD use identical ruff version, ensuring: ✅ Same linting rules ✅ Same auto-fixes ✅ No surprises in CI after local checks pass * fix: upgrade ruff to 0.14.2 to match SAGE main repository PROBLEM: - neuromem used ruff 0.8.4 - SAGE main repo uses ruff 0.14.2 - Version mismatch could cause conflicts and inconsistent linting SOLUTION: - Upgrade to ruff==0.14.2 in all configurations: * .pre-commit-config.yaml: v0.14.2 * .github/workflows/test.yml: ruff==0.14.2 * setup.py: ruff==0.14.2 - Update CI_CD_README.md to document alignment with SAGE BENEFITS: ✅ Consistent with SAGE main repository ✅ Latest ruff features and bug fixes ✅ No version conflicts when developing within SAGE ✅ Future-proof (using actively maintained version) * feat: add unit tests for neuromem ADDED: - tests/test_neuromem_standalone.py: Standalone smoke tests for neuromem package - tests/test_vdb_statistics.py: Comprehensive VDB statistics tests (moved from SAGE) - Updated CI workflow to run standalone tests DESIGN: - test_neuromem_standalone.py: Simple tests that work without SAGE dependencies - test_vdb_statistics.py: Full tests that require SAGE environment (run in SAGE CI) - Dual-environment approach: neuromem CI runs standalone, SAGE CI runs comprehensive BENEFITS: ✅ Neuromem has its own basic test suite ✅ Avoid test duplication - statistics tests moved from SAGE to neuromem ✅ SAGE CI will run comprehensive tests with full dependencies ✅ Neuromem CI validates standalone functionality * fix: replace absolute SAGE import with relative import in graph_collection PROBLEM: - graph_collection.py used absolute import: from sage.middleware.components.sage_mem.neuromem.utils.path_utils import ... - This breaks neuromem when used as standalone package (ModuleNotFoundError: No module named 'sage') - CI tests failed because neuromem couldn't import its own modules SOLUTION: - Replace with relative import: from ..utils.path_utils import get_default_data_dir - Fix import ordering with ruff 0.14.2 VERIFICATION: ✅ Local pre-commit hooks pass ✅ ruff check passes ✅ Works in both SAGE and standalone contexts * fix(ci): 修复本地与CI环境不一致的代码质量检查问题 - 使用 pre-commit 替代直接运行 ruff,确保本地和CI行为一致 - 添加 ruff.lint.isort 配置,明确 import 排序规则 - 重写 test_neuromem_standalone.py,移除对 SAGE 依赖的导入 - 添加详细的 CI 失败修复指南 修复问题: 1. Import 排序在不同环境下结果不一致 2. 独立测试因导入 SAGE 模块而失败 参考 SAGE 主仓库的 CI/CD 最佳实践 * style: remove blank line after import pytest in test_vdb_statistics Fix CI ruff format check failure * fix(ci): run standalone tests without installing package Avoid importing __init__.py which contains SAGE dependencies. Install only pytest, not the full neuromem package with dev dependencies. * Initial plan * fix(ci): resolve ruff check and unit test failures - Remove blank lines between standard and third-party imports in kv_collection.py, vdb_collection.py, faiss_index.py - Update standalone tests to run from /tmp to avoid importing package __init__.py - Remove tests/__init__.py to prevent pytest from treating tests as a subpackage - Add pytest configuration to pyproject.toml Co-authored-by: ShuhaoZhangTony <[email protected]> * refactor: address code review feedback in test file - Add return type annotation to get_neuromem_root() - Improve docstring with detailed path resolution explanation - Extract duplicate lists to module-level constants Co-authored-by: ShuhaoZhangTony <[email protected]> * Rebase memory statistics feature to main-dev external embedder pattern (#7) * Initial plan * Rebase vdb_collection.py to use main-dev's external embedder approach with statistics tracking Co-authored-by: ShuhaoZhangTony <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: ShuhaoZhangTony <[email protected]> * fix: apply ruff formatting to resolve CI code quality failures (#9) * Initial plan * fix: apply ruff formatting to vdb_collection.py Co-authored-by: ShuhaoZhangTony <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: ShuhaoZhangTony <[email protected]> * Initial plan * fix: remove duplicate imports and trailing whitespace in vdb_collection.py Co-authored-by: ShuhaoZhangTony <[email protected]> --------- Co-authored-by: zhang shuhao <[email protected]> Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: ShuhaoZhangTony <[email protected]>
|
@copilot Run pre-commit run --all-files --show-diff-on-failure || {
Found 1 error (1 fixed, 0 remaining). ruff format..............................................................Failed
1 file reformatted, 28 files left unchanged Check Python syntax......................................................Passed
❌ 代码质量检查失败! ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1️⃣ 安装 pre-commit(一次性): 2️⃣ 运行检查(与CI完全一致): 3️⃣ 自动修复大多数问题: 4️⃣ 手动修复特定问题: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Error: Process completed with exit code 1. |
|
@ShuhaoZhangTony I've opened a new pull request, #11, to work on those changes. Once the pull request is ready, I'll request review from you. |
* feat(vdb): improve threshold handling and index configuration - Fix nested config merging in FaissIndex initialization - Make threshold optional (None = no filtering, only topk) - Update threshold documentation for different index types - Change warning to info level for non-critical logs - Fix LSH/Hamming distance threshold filtering logic * fix: remove trailing whitespace and duplicate import
* Initial plan * fix: remove duplicate import in vdb_collection.py to pass pre-commit checks Co-authored-by: ShuhaoZhangTony <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: ShuhaoZhangTony <[email protected]>
- Add proper type annotations to avoid None attribute errors - Fix vector processing with proper type guards - Use typed intermediate variables (processed_vector, processed_query, proc_vec) - Fix .get() chain calls to use direct dict access - Add None checks for config parameter - Fix test code to handle None return values
No description provided.