Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
以下是这次“记忆化搜索 + 行为总结”的整体设计思路、实现要点、知识库策略。
总体思路
从“按问题命中”改为“按方法命中”:用“方法键 + 方法源码哈希”作为缓存主键,问题只用于检索定位目标方法。同义问法自然复用。
向量优先命中:为方法保存嵌入向量;后续问题直接用问题向量在“命中文件下的已缓存方法向量”里做相似度检索,秒级命中。
自动失效:方法源码变化会导致方法哈希变化,自动写入新版本摘要;旧版保留可审计。
首次:
选文件 → 方法级选片 → 生成方法摘要(LLM/AST)→ 写入 summary + method_embedding。
复问(相同/同义):
问题向量 → 命中文件的“缓存方法向量”内检索 → 直接返回命中方法摘要,hit_count +1。
变更:
方法体改动 → method_hash 变化 → 写入新版本摘要(老版本保留);embedding 按新源码更新。
无答案:
返回 [NO_ANSWER],避免写入缓存。
知识库与检索策略
知识库构建整体思路:
从“按描述全文索引”转为“按名称索引、描述仅用于展示”,让检索更稳、更小体积。
把“干员技能”和“术语(cc.*)”两套数据同时写入一个 FAISS 索引与同一份 meta,统一检索出口
一个简单的实现,还有可以优化的地方