Skip to content

Latest commit

 

History

History
1177 lines (1036 loc) · 80.6 KB

File metadata and controls

1177 lines (1036 loc) · 80.6 KB

Morphic-Agent Architecture

Clean Architecture (4-layer) + TDD + Pydantic Strict Mode + OSS-First

Phase 1 Foundation: COMPLETE (2026-02-25) — 7/7 sprints done Phase 2 Parallel & Planning + CLI: COMPLETE (2026-02-26) — All 6 sprints (2-A through 2-F) + CLI v1 Phase 3 Semantic Memory & Context Bridge: COMPLETE (2026-02-26) — Week 5: SemanticFingerprint LSH → ContextZipper v2 → ForgettingCurve → DeltaEncoder → HierarchicalSummarizer | Week 6: Context Bridge → MCP Server → MCP Client → L1-L4 Integration — 725 tests (699 unit + 26 integration) Phase 4 Agent CLI Orchestration: COMPLETE — Sprint 4.1–4.6 done (domain foundation + 6 drivers + RouteToEngine use case + API/CLI + knowledge file injection) — 898 unit tests + 30 integration Phase 5 Marketplace & Tools: COMPLETE — Sprint 5.1–5.6 done (safety scorer + MCP registry + tool installer + auto discoverer + Ollama manager + marketplace UI) — 988 unit tests + 37 integration Phase 6 Self-Evolution: COMPLETE — Sprint 6.1–6.5 done (execution recorder + tactical recovery + strategy learning + systemic evolution + evolution dashboard) — 1162 unit tests + 37 integration Phase 7 UCL: Sprint 7.1 COMPLETE (2026-03-10) — UCL domain foundation (Decision, AgentAction, SharedTaskState, AgentAffinityScore, CognitiveMemoryType, 2 ports, AgentAffinityScorer service) — 1230 unit tests + 37 integration Phase 7 UCL: Sprint 7.2 COMPLETE (2026-03-10) — ContextAdapterPort + InMemorySharedTaskStateRepo + 6 context adapters (Claude Code, Gemini, Codex, Ollama, OpenHands, ADK) — 1350 unit tests + 37 integration Phase 7 UCL: Sprint 7.3 COMPLETE (2026-03-11) — Insight Extraction Pipeline (MemoryClassifier + ConflictResolver + InsightExtractor + ExtractInsightsUseCase + ExecuteTask integration + container wiring) — 1438 unit tests + 37 integration Phase 7 UCL: Sprint 7.4 COMPLETE (2026-03-11) — Affinity-Aware Routing + Task Handoff (AgentAffinityRepository port + TopicExtractor + select_with_affinity() + InMemory/JSONL affinity stores + RouteToEngine with affinity/adapter/action recording + HandoffTaskUseCase + container wiring) — 1528 unit tests + 37 integration Phase 7 UCL: Sprint 7.5 COMPLETE (2026-03-11) — UCL API + CLI + UI (cognitive API routes + CLI commands + Next.js cognitive page + 30 new tests) — 1558 unit tests + 37 integration Sprint 5.7 COMPLETE (2026-03-12) — Model management test coverage (15 tests) + auto-discovery trigger on task failure (8 tests) — Phase 5 fully complete — 1592 unit tests + 50 integration Sprint 8.1 COMPLETE (2026-03-12) — Self-Evolution Loop Closure: ExecuteTaskUseCase now auto-records ExecutionRecords after every task execution, feeding Phase 6's AnalyzeExecution/UpdateStrategy/SystemicEvolution pipeline — 1599 unit tests + 50 integration Sprint 8.2 COMPLETE (2026-03-14) — Production Verification: Fix settings-driven default Ollama model in LiteLLMGateway (TD-065), add E2E smoke test suite (29 pytest + 21 shell checks covering all 42 API endpoints) (TD-066) — 1599 unit + 29 E2E + 50 integration, 0 failures


Phase 10 — UX Intelligence: COMPLETE (2026-03-17) Sprint 10.1: Complexity-Aware Execution Prompts — ExecutionPromptBuilder + AnswerExtractor (TD-072) Sprint 10.2: React Flow Node Redesign — DAG depth layout + tooltips + dynamic height (TD-073) Sprint 10.3: Result Display Restructure — answer/reasoning separation + complexity-aware UI (TD-074) 1,705 unit tests + 50 integration, 0 failures, lint clean

Phase 12 — ReAct Loop + Model-Aware Decomposition + Two Worlds: COMPLETE (2026-03-20) Sprint 12.1-12.4: ReAct Loop — ReactExecutor, ReactTrace, ToolSchema, ToolSelector, web_search/web_fetch, LiteLLM tool-calling, 38 LAEE tool schemas (TD-077 to TD-080) Sprint 12.5: Model-Aware Subtask Decomposition — ModelPreference VO, ModelPreferenceExtractor, preferred_model on SubTask (TD-081) Sprint 12.6: Dynamic Multi-Model Decomposition — CollaborationMode, ModelCapabilityRegistry, LLM-based _llm_multi_model_decompose with static fallback (TD-082) Sprint 12.7: Two Worlds Integration — per-engine routing, auto-upgrade, discussion phase, DEGRADED validation, MCP routing, tools_used/data_sources tracking (TD-083) Sprint 12.2 Revisited: Engine routing activated — use_engine_route now conditionally enabled via _resolve_engine_type(), connecting all 6 autonomous agent runtimes to the execution pipeline (TD-084) 1,950 unit tests + 50 integration, 0 failures, lint clean Sprint 12.8: Pipeline robustness — smart auto-upgrade model selection, engine output URL extraction, DEGRADED validation fix for engine-routed subtasks (TD-085)

Phase 13 — Multi-Agent Collaboration: COMPLETE (2026-03-22) Sprint 13.1: Iterative Multi-Round Discussion — configurable N-round discussion with model rotation, critique prompt, early stop on failure (TD-086) Sprint 13.2: Engine-Routed Discussion — autonomous agent runtimes in discussion rounds, LLM API fallback (TD-087) Sprint 13.3: Dynamic Agent Role Assignment — free-form roles from user input (regex) or LLM generation, injected into all execution paths + discussion (TD-088) Sprint 13.4a: Artifact-Aware Planning — input/output artifacts on SubTask, produces/consumes on PlanStep, artifact chaining in execution + discussion (TD-089) Sprint 13.4b: Artifact Runtime Extraction — ArtifactExtractor domain service, smart code/URL/JSON parsing from engine output, keyword heuristic matching (TD-090) Sprint 13.5: Adaptive Discussion Strategy — ConvergenceDetector domain service, Jaccard + agreement/divergence signals, early stop on convergence (TD-091) Sprint 14.1: Engine Cost Tracking — EngineCostCalculator domain service, 4 CLI drivers cost integration, Gemini CLI install, MCP config docs (TD-092) 2,132 unit tests + 50 integration, 0 failures, lint clean

Phase 11 — Intelligence Quality: COMPLETE (2026-03-17) Sprint 11.1: Semantic Dedup — EmbeddingPort-based cosine similarity dedup in InsightExtractor (TD-075) Sprint 11.2: Hybrid Classifier — MemoryClassifierPort ABC + regex-first/LLM-fallback HybridMemoryClassifier (TD-076) Sprint 11.3: Benchmark + Docs — paraphrased_facts scenario, TD-075/TD-076, component count updates 1,737 unit tests + 50 integration, 0 failures, lint clean

Phase 9 — Intelligence Layer: COMPLETE (2026-03-14) Sprint 9.1: Smart Decomposition — TaskComplexityClassifier (SIMPLE/MEDIUM/COMPLEX) + adaptive decomposition (TD-067) Sprint 9.2: LAEE Code Execution — code block extraction + shell_exec execution + SubTask code/output fields (TD-068) Sprint 9.3: UI Result Formatting — CodeBlock + ExecutionResult + resultParser components (TD-069) Sprint 9.4: Interactive Planning Default — 3-mode plan-first flow (interactive/auto/disabled) (TD-070) Sprint 9.5: Full-Stack Structured Logging — centralized backend logging + frontend API tracing (TD-071) 1,680 unit tests + 50 integration, 0 failures, lint clean


Layer Overview

Interface → Application → Domain ← Infrastructure
                                    (Dependency Inversion)
┌──────────────────────────────────────────────────────────────┐
│  interface/          Layer 4: Entry Points                    │
│  ├── api/            FastAPI routes, WebSocket handlers       │
│  └── cli/            typer CLI commands + rich formatting     │
├──────────────────────────────────────────────────────────────┤
│  application/        Layer 3: Use Cases                       │
│  ├── use_cases/      Orchestration of domain operations       │
│  └── dto/            Data Transfer Objects between layers     │
├──────────────────────────────────────────────────────────────┤
│  domain/             Layer 1: Pure Business Logic             │
│  ├── entities/       Pydantic models (strict=True)           │
│  ├── value_objects/  Enums, immutable types                   │
│  ├── ports/          ABC interfaces (Dependency Inversion)    │
│  └── services/       Pure domain services (no I/O)           │
├──────────────────────────────────────────────────────────────┤
│  infrastructure/     Layer 2: Port Implementations            │
│  ├── persistence/    SQLAlchemy ORM, pgvector, Neo4j         │
│  ├── llm/            LiteLLM, Ollama adapters                │
│  ├── local_execution/ LAEE tool implementations              │
│  ├── memory/         mem0, vector DB adapters                │
│  └── marketplace/    MCP registry client, tool installer     │
├──────────────────────────────────────────────────────────────┤
│  shared/             Cross-cutting Concerns                   │
│  └── config.py       pydantic-settings (env vars)            │
├──────────────────────────────────────────────────────────────┤
│  tests/                                                       │
│  ├── unit/domain/    No DB, no I/O. Fast (0.03s for 67)     │
│  ├── unit/application/ Ports mocked, no DB                   │
│  ├── integration/    Docker Compose required                  │
│  └── e2e/            Full stack                               │
└──────────────────────────────────────────────────────────────┘

Dependency Rules

Allowed

From To Example
interface/ application/ API route calls use case
application/ domain/ Use case uses entity + port
infrastructure/ domain/ Adapter implements port ABC
Any layer shared/ Config, logging

Forbidden

From To Why
domain/ infrastructure/ Domain must be pure. No SQLAlchemy, no HTTP, no LiteLLM
domain/ application/ Domain has zero outward dependencies
domain/ interface/ Domain has zero outward dependencies
application/ infrastructure/ Use cases depend on ports (ABCs), not implementations
interface/ infrastructure/ Interface calls use cases, not adapters directly

Dependency Inversion in Practice

# domain/ports/task_repository.py — defines the interface
class TaskRepository(ABC):
    @abstractmethod
    async def save(self, task: TaskEntity) -> None: ...

# infrastructure/persistence/pg_task_repository.py — implements it
class PgTaskRepository(TaskRepository):
    async def save(self, task: TaskEntity) -> None:
        # SQLAlchemy code here — domain doesn't know about this

# application/use_cases/create_task.py — depends on the port
class CreateTaskUseCase:
    def __init__(self, repo: TaskRepository):  # ABC, not PgTaskRepository
        self._repo = repo

CLI + API: Dual Interface Design

Both CLI and API are first-class interfaces. They call the same use cases with zero logic duplication.

┌─────────────────────────────────────────────────────────┐
│                      interface/                          │
│                                                          │
│  ┌── api/ ──────────────┐  ┌── cli/ ──────────────────┐ │
│  │ FastAPI + WebSocket   │  │ typer + rich              │ │
│  │                       │  │                           │ │
│  │ POST /api/tasks  ─────┼──┼── morphic task create ──────┤ │
│  │ GET  /api/cost   ─────┼──┼── morphic cost summary ─────┤ │
│  │ GET  /api/models ─────┼──┼── morphic model list ───────┤ │
│  │ GET  /api/engines ────┼──┼── morphic engine list ──────┤ │
│  │ POST /api/engines/run ┼──┼── morphic engine run  ──────┤ │
│  │ GET  /api/marketplace ┼──┼── morphic marketplace ──────┤ │
│  └───────────────────────┘  └───────────────────────────┘ │
│              │                          │                  │
│              └──────────┬───────────────┘                  │
│                         ▼                                  │
│              application/use_cases/                        │
│              ├── create_task.py                            │
│              ├── route_to_engine.py                        │
│              └── ...                                       │
└─────────────────────────────────────────────────────────┘

CLI Architecture (Phase 2 + 5 + 6)

interface/cli/
├── main.py            # typer.Typer() app, entry point: `morphic`
├── commands/
│   ├── task.py        # morphic task {create|list|show|cancel}
│   ├── model.py       # morphic model {list|status|pull|delete|switch|info} (Sprint 5.5)
│   ├── cost.py        # morphic cost {summary|budget}
│   ├── plan.py        # morphic plan {create|list|show|approve|reject}
│   ├── mcp.py         # morphic mcp server
│   ├── engine.py      # morphic engine {list|run} (Sprint 4.3)
│   ├── marketplace.py # morphic marketplace {search|install|list|suggest|uninstall} (Sprint 5.3)
│   └── evolution.py   # morphic evolution {stats|failures|update|report} (Sprint 6.5)
└── formatters.py      # rich-based tables, progress bars, syntax highlighting

Why Both?

Interface Best For
API + UI Visual task graph, real-time monitoring, non-technical users
CLI Scriptable automation, CI/CD, power users, lightweight environments

OSS Dependency Map

Custom code is minimized. Every infrastructure component wraps an established OSS library.

Infrastructure Layer — All OSS

Component OSS Library What We Write
Task DAG LangGraph Wrapper + state definitions
LLM routing LiteLLM Routing logic + cost callbacks
Structured output Instructor Schema definitions only
Semantic memory mem0 Hierarchy management
Embedding Ollama /api/embed OllamaEmbeddingAdapter (Sprint 3.1)
LSH bucketing numpy SemanticFingerprint + SemanticBucketStore (Sprint 3.1)
Vector search pgvector Cosine distance queries (Sprint 3.1)
Knowledge graph Neo4j driver Cypher query builder
ORM + migrations SQLAlchemy + Alembic Models + migration scripts
API framework FastAPI Route handlers
CLI framework typer + rich Command handlers + formatters
MCP server/client mcp (FastMCP) 6 tools + 2 resources, client adapter
Browser automation Playwright Tool wrappers
File watching watchdog Event handlers
Scheduling APScheduler Job definitions
Process management psutil Read-only wrappers
Configuration pydantic-settings Settings class
Task queue Celery + Redis Task definitions
Logging structlog Config only

Domain Layer — Custom (Intentionally)

Module Why Custom
domain/entities/* Business models are inherently project-specific
domain/services/risk_assessor.py 40+ tool risk classification is domain logic
domain/services/approval_engine.py 3×5 approval matrix is domain logic
domain/services/semantic_fingerprint.py LSH hash + cosine similarity is pure math (Sprint 3.1)
domain/services/forgetting_curve.py Ebbinghaus retention scoring R=e^(-t/S) is pure math (Sprint 3.3)
domain/services/delta_encoder.py Git-style delta hashing/reconstruction/diffing is pure logic (Sprint 3.4)
domain/services/hierarchical_summarizer.py 4-level tree compression: extractive summarization + level selection is pure logic (Sprint 3.5)
domain/services/agent_engine_router.py Two-tier routing: task characteristics → execution engine selection, pure static (Sprint 4.1)
domain/services/tool_safety_scorer.py Multi-signal safety scoring: publisher trust + transport + popularity + metadata (Sprint 5.1)
domain/services/failure_analyzer.py Regex error pattern → MCP search query mapping, pure static (Sprint 5.4)
domain/services/memory_classifier.py Regex-based text → CognitiveMemoryType classification, pure static (Sprint 7.3)
domain/services/conflict_resolver.py Jaccard+negation contradiction detection, confidence-weighted resolution, pure static (Sprint 7.3)
domain/value_objects/* Project-specific enums and types
domain/ports/* Interface definitions are project-specific

Rule: If something exists in PyPI/npm and covers 80%+ of the need, use it. Only write custom code for domain logic.


Current File Map

morphic-agent/
├── domain/                          # Layer 1: Pure Business Logic
│   ├── entities/
│   │   ├── task.py                  # TaskEntity, SubTask (strict)
│   │   ├── execution.py            # Action, Observation, UndoAction (strict)
│   │   ├── memory.py               # MemoryEntry (strict)
│   │   ├── cost.py                 # CostRecord (strict)
│   │   ├── delta.py                # Delta — Git-style state change record (strict, Sprint 3.4)
│   │   ├── plan.py                 # PlanStep, ExecutionPlan (strict)
│   │   ├── tool_candidate.py       # ToolCandidate — MCP tool metadata (strict, Sprint 5.1)
│   │   ├── execution_record.py    # ExecutionRecord — single execution outcome (strict, Sprint 6.1)
│   │   └── strategy.py            # RecoveryRule, ModelPreference, EnginePreference (strict, Sprint 6.3)
│   ├── value_objects/
│   │   ├── status.py               # TaskStatus, SubTaskStatus, ObservationStatus, MemoryType, PlanStatus
│   │   ├── risk_level.py           # RiskLevel (5-tier IntEnum)
│   │   ├── approval_mode.py        # ApprovalMode (3-tier)
│   │   ├── model_tier.py           # ModelTier, TaskType (+LONG_RUNNING_DEV, +WORKFLOW_PIPELINE)
│   │   ├── agent_engine.py         # AgentEngineType (6 engines, Sprint 4.1)
│   │   ├── tool_safety.py          # SafetyTier (4-tier IntEnum: UNSAFE→VERIFIED, Sprint 5.1)
│   │   └── evolution.py            # EvolutionLevel (TACTICAL, STRATEGIC, SYSTEMIC, Sprint 6.1)
│   ├── ports/
│   │   ├── task_repository.py      # TaskRepository ABC
│   │   ├── task_engine.py          # TaskEngine ABC (decompose + execute)
│   │   ├── llm_gateway.py          # LLMGateway ABC + LLMResponse
│   │   ├── local_executor.py       # LocalExecutor ABC (LAEE)
│   │   ├── audit_logger.py         # AuditLogger ABC
│   │   ├── memory_repository.py    # MemoryRepository ABC (+ list_by_type, Sprint 3.3)
│   │   ├── cost_repository.py      # CostRepository ABC
│   │   ├── plan_repository.py      # PlanRepository ABC
│   │   ├── embedding.py            # EmbeddingPort ABC (Sprint 3.1)
│   │   ├── mcp_client.py           # MCPClientPort ABC (Sprint 3.8)
│   │   ├── agent_engine.py         # AgentEnginePort ABC + Result + Capabilities (Sprint 4.1)
│   │   ├── tool_registry.py        # ToolRegistryPort ABC + ToolSearchResult (Sprint 5.2)
│   │   ├── tool_installer.py       # ToolInstallerPort ABC + InstallResult (Sprint 5.3)
│   │   ├── execution_record_repository.py # ExecutionRecordRepository ABC + ExecutionStats (Sprint 6.1)
│   │   ├── shared_task_state_repository.py # SharedTaskStateRepository ABC (Sprint 7.1)
│   │   ├── insight_extractor.py    # InsightExtractorPort ABC + ExtractedInsight (Sprint 7.1)
│   │   ├── context_adapter.py      # ContextAdapterPort ABC + AdapterInsight (Sprint 7.2)
│   │   └── memory_classifier.py   # MemoryClassifierPort ABC — text→CognitiveMemoryType (Sprint 11.2)
│   └── services/
│       ├── risk_assessor.py        # 40+ tool risk mapping + escalation
│       ├── approval_engine.py      # 3-mode × 5-risk approval matrix
│       ├── semantic_fingerprint.py # LSH hash + cosine similarity (Sprint 3.1)
│       ├── forgetting_curve.py    # Ebbinghaus retention scoring R=e^(-t/S) (Sprint 3.3)
│       ├── delta_encoder.py      # hash_changes, reconstruct, create_delta, compute_diff (Sprint 3.4)
│       ├── hierarchical_summarizer.py # estimate_tokens, split_sentences, extract_summary, build_hierarchy, select_level (Sprint 3.5)
│       ├── agent_engine_router.py # select, get_fallback_chain, select_with_fallbacks — pure static (Sprint 4.1)
│       ├── tool_safety_scorer.py  # score() — publisher trust + transport + popularity + metadata (Sprint 5.1)
│       ├── failure_analyzer.py    # extract_queries() — regex error→search query mapping (Sprint 5.4)
│       ├── tactical_recovery.py   # find_alternative, create_rule, rank_rules — pure static (Sprint 6.2)
│       ├── memory_classifier.py   # classify(), classify_with_confidence() — regex→CognitiveMemoryType (Sprint 7.3)
│       ├── conflict_resolver.py   # detect_conflicts(), resolve(), resolve_all() — Jaccard+negation (Sprint 7.3)
│       └── engine_cost_calculator.py # calculate(), calculate_detailed(), estimate_from_duration() — token→USD (Sprint 14.1)
│
├── application/                     # Layer 3: Use Cases
│   ├── use_cases/
│   │   ├── create_task.py          # CreateTaskUseCase (decompose + persist)
│   │   ├── execute_task.py         # ExecuteTaskUseCase (run DAG + persist + insight extraction hook)
│   │   ├── cost_estimator.py       # CostEstimator (per-model token pricing)
│   │   ├── interactive_plan.py     # InteractivePlanUseCase (create/approve/reject)
│   │   ├── background_planner.py   # BackgroundPlannerUseCase (advisory monitoring)
│   │   ├── route_to_engine.py      # RouteToEngineUseCase (engine selection + fallback execution, Sprint 4.3)
│   │   ├── install_tool.py         # InstallToolUseCase (search + score + install, Sprint 5.3)
│   │   ├── discover_tools.py       # DiscoverToolsUseCase (failure → suggest tools, Sprint 5.4)
│   │   ├── manage_ollama.py        # ManageOllamaUseCase (status/pull/delete/switch, Sprint 5.5)
│   │   ├── analyze_execution.py   # AnalyzeExecutionUseCase (record + stats + failure patterns, Sprint 6.1)
│   │   ├── update_strategy.py     # UpdateStrategyUseCase (model/engine prefs + recovery rules, Sprint 6.3)
│   │   ├── systemic_evolution.py  # SystemicEvolutionUseCase (tool gap detection + evolution report, Sprint 6.4)
│   │   └── extract_insights.py   # ExtractInsightsUseCase (extract → resolve → store → update state, Sprint 7.3)
│   └── dto/                         # (stub — Sprint 1.4)
│
├── infrastructure/                  # Layer 2: Port Implementations
│   ├── persistence/
│   │   ├── database.py              # Async SQLAlchemy engine + session
│   │   ├── models.py                # ORM models (Vector(384) for embedding)
│   │   ├── in_memory.py             # InMemory repos (optional embedding_port)
│   │   ├── pg_task_repository.py    # PgTaskRepository (TaskEntity ↔ TaskModel)
│   │   ├── pg_cost_repository.py    # PgCostRepository (SQL aggregation)
│   │   ├── pg_memory_repository.py  # PgMemoryRepository (pgvector cosine + ILIKE fallback)
│   │   ├── pg_plan_repository.py    # PgPlanRepository (ExecutionPlan ↔ PlanModel)
│   │   ├── in_memory_execution_record.py # InMemoryExecutionRecordRepository (Sprint 6.1)
│   │   └── shared_task_state_repo.py # InMemorySharedTaskStateRepository (Sprint 7.2)
│   ├── llm/                         # Sprint 1.2 + 5.5: LLM Layer
│   │   ├── ollama_manager.py        # Ollama lifecycle (health, pull, delete, info, running, Sprint 5.5)
│   │   ├── litellm_gateway.py       # LLMGateway impl (LOCAL_FIRST routing + LiteLLM)
│   │   └── cost_tracker.py          # CostRepository wrapper + budget checking
│   ├── task_graph/                  # Sprint 1.3: Task Graph Engine
│   │   ├── state.py                 # AgentState TypedDict (LangGraph state)
│   │   ├── intent_analyzer.py       # LLM goal → subtask decomposition
│   │   └── engine.py                # LangGraphTaskEngine (DAG + parallel + retry)
│   ├── queue/                       # Sprint 2-B: Celery + Redis
│   │   ├── celery_app.py            # Celery app factory (Redis broker/backend)
│   │   └── tasks.py                 # execute_task_worker Celery task
│   ├── local_execution/             # Sprint 1.3b + 2-E: LAEE
│   │   ├── executor.py              # LocalExecutor (risk → approve → execute → audit)
│   │   ├── audit_log.py             # JsonlAuditLogger (append-only JSONL)
│   │   ├── undo_manager.py          # Stack-based undo for reversible ops
│   │   └── tools/
│   │       ├── shell_tools.py       # shell_exec, shell_background, shell_stream, shell_pipe
│   │       ├── fs_tools.py          # fs_read, fs_write, fs_edit, fs_delete, fs_move, fs_glob, fs_tree
│   │       ├── system_tools.py      # process_list, process_kill, resource_info, clipboard, notify
│   │       ├── dev_tools.py         # dev_git, dev_docker, dev_pkg_install, dev_env_setup
│   │       ├── browser_tools.py     # navigate, click, type, screenshot, extract, pdf (Playwright)
│   │       ├── gui_tools.py         # applescript, open_app, screenshot_ocr, accessibility (macOS)
│   │       └── cron_tools.py        # schedule, once, list, cancel (APScheduler)
│   ├── memory/                      # Sprint 1.5 + 3.1–3.5: Semantic Memory
│   │   ├── memory_hierarchy.py      # L1-L4 unified manager + compact() + record_delta/get_state + summarize_entry/retrieve_at_depth
│   │   ├── context_zipper.py        # Query-adaptive compression (v2: async, semantic, Sprint 3.2)
│   │   ├── forgetting_curve.py      # ForgettingCurveManager + CompactResult (Sprint 3.3)
│   │   ├── delta_encoder.py         # DeltaEncoderManager + DeltaRecordResult (Sprint 3.4)
│   │   ├── hierarchical_summarizer.py # HierarchicalSummaryManager + SummarizeResult (Sprint 3.5)
│   │   ├── context_bridge.py        # ContextBridge + ExportResult — 4 platform formatters (Sprint 3.6)
│   │   ├── knowledge_graph.py       # Neo4j adapter (L3)
│   │   ├── semantic_fingerprint.py  # SemanticBucketStore (LSH bucketing, Sprint 3.1)
│   │   └── embedding_adapters.py    # OllamaEmbeddingAdapter (POST /api/embed, Sprint 3.1)
│   ├── mcp/                         # Sprint 3.7–3.8: Model Context Protocol
│   │   ├── server.py                # create_mcp_server() — FastMCP, 6 tools + 2 resources
│   │   └── client.py               # MCPClient, MCPToolAdapter, discover_and_register
│   └── agent_cli/                   # Sprint 4.1–4.6: Agent CLI Orchestration
│       ├── __init__.py              # Re-exports all 6 drivers
│       ├── _subprocess_base.py      # CLIResult + SubprocessMixin (shared by CLI drivers)
│       ├── ollama_driver.py         # OllamaEngineDriver (wraps LiteLLMGateway, $0)
│       ├── claude_code_driver.py    # ClaudeCodeDriver (claude -p, 200K ctx)
│       ├── codex_cli_driver.py      # CodexCLIDriver (codex exec, sandbox+MCP)
│       ├── gemini_cli_driver.py     # GeminiCLIDriver (gemini -p, 2M ctx)
│       ├── openhands_driver.py      # OpenHandsDriver (httpx REST, Docker sandbox)
│       ├── adk_driver.py            # ADKDriver (Google ADK Python SDK, 2M ctx, Sprint 4.5)
│       └── knowledge_loader.py      # KnowledgeFileLoader (engine→file mapping, Sprint 4.6)
│   ├── cognitive/                   # Sprint 7.2-7.3 + 11.1-11.2: UCL + Semantic Dedup + Hybrid Classifier
│   │   ├── __init__.py              # Re-exports InsightExtractor (Sprint 7.3)
│   │   ├── insight_extractor.py     # InsightExtractorPort impl — semantic dedup + reclassify (Sprint 7.3, 11.1)
│   │   ├── hybrid_memory_classifier.py # MemoryClassifierPort impl — regex + LLM fallback (Sprint 11.2)
│   │   └── adapters/
│   │       ├── __init__.py          # Re-exports all 6 adapters
│   │       ├── _base.py             # Shared helpers (format, truncate, regex patterns)
│   │       ├── claude_code.py       # ClaudeCodeContextAdapter (CLAUDE.md format, 200K ctx)
│   │       ├── gemini.py            # GeminiContextAdapter (XML blocks, 2M ctx)
│   │       ├── codex.py             # CodexContextAdapter (AGENTS.md format)
│   │       ├── ollama.py            # OllamaContextAdapter (ultra-compact, small window)
│   │       ├── openhands.py         # OpenHandsContextAdapter (REST task context)
│   │       └── adk.py               # ADKContextAdapter (workflow-context XML)
│   ├── marketplace/                 # Sprint 5.2–5.3: Marketplace
│   │   ├── __init__.py              # Re-exports MCPRegistryClient, MCPToolInstaller
│   │   ├── mcp_registry_client.py   # MCPRegistryClient(ToolRegistryPort) — httpx, auto-score (Sprint 5.2)
│   │   └── tool_installer.py        # MCPToolInstaller(ToolInstallerPort) — npm/pip subprocess (Sprint 5.3)
│   └── evolution/                   # Sprint 6.3: Self-Evolution Engine
│       ├── __init__.py              # Re-exports StrategyStore
│       └── strategy_store.py        # StrategyStore — JSONL persistence (recovery rules + preferences)
│
├── interface/                       # Layer 4: Entry Points
│   ├── api/                         # Sprint 1.6: FastAPI + WebSocket
│   │   ├── main.py                  # create_app() factory + lifespan + CORS
│   │   ├── container.py             # AppContainer DI (Settings → repos → use cases)
│   │   ├── schemas.py               # 50+ Pydantic request/response models (+UCL cognitive schemas, Sprint 7.5)
│   │   ├── websocket.py             # /ws/tasks/{id} (poll + delta-only sends + recommendations)
│   │   └── routes/
│   │       ├── tasks.py             # POST, GET, GET/{id}, DELETE /api/tasks (+ Celery dispatch)
│   │       ├── plans.py             # POST, GET, approve, reject /api/plans
│   │       ├── models.py            # GET /api/models/status, POST /pull, DELETE /{name}, POST /switch, GET /running (Sprint 5.5)
│   │       ├── cost.py              # GET /api/cost, GET /api/cost/logs
│   │       ├── memory.py            # GET /api/memory/search?q= + GET /api/memory/export?platform=
│   │       ├── engines.py           # GET /api/engines, GET /api/engines/{type}, POST /api/engines/run (Sprint 4.3)
│   │       ├── marketplace.py       # GET /search, POST /install, GET /installed, POST /suggest, DELETE /{name} (Sprint 5.3)
│   │       ├── evolution.py        # GET /stats, GET /failures, GET /preferences, POST /update, POST /evolve (Sprint 6.1)
│   │       ├── cognitive.py       # GET/DELETE /state, GET /affinity, POST /handoff, POST /insights/extract (Sprint 7.5)
│   │       └── benchmarks.py     # POST /api/benchmarks/{run|continuity|dedup} (Sprint 7.6)
│   └── cli/                         # Sprint 2.9-2.11: typer + rich
│       ├── main.py                  # typer app, _get_container() lazy singleton, _run() async bridge
│       ├── formatters.py            # Rich tables, trees, status styles, safety badge (all output isolated here)
│       └── commands/
│           ├── task.py              # morphic task {create|list|show|cancel}
│           ├── model.py             # morphic model {list|status|pull|delete|switch|info} (Sprint 5.5)
│           ├── cost.py              # morphic cost {summary|budget}
│           ├── plan.py              # morphic plan {create|list|show|approve|reject}
│           ├── mcp.py               # morphic mcp server (stdio/streamable-http)
│           ├── engine.py            # morphic engine {list|run} (Sprint 4.3)
│           ├── marketplace.py       # morphic marketplace {search|install|list|suggest|uninstall} (Sprint 5.3)
│           ├── evolution.py        # morphic evolution {stats|failures|update|report} (Sprint 6.1)
│           ├── cognitive.py       # morphic cognitive {state|delete|affinity|handoff|insights} (Sprint 7.5)
│           └── benchmark.py      # morphic benchmark {run|continuity|dedup} (Sprint 7.6)
│
├── shared/
│   └── config.py                    # pydantic-settings (all env vars + marketplace + evolution settings)
│
├── ui/                              # Sprint 1.6 + 2-F + 5.6: Next.js 15 (bun, Tailwind CSS 4, @xyflow/react)
│   ├── lib/
│   │   ├── theme.ts                 # morphicAgentTheme design tokens
│   │   └── api.ts                   # Typed fetch wrappers + WebSocket + Plan/Marketplace/Model/Evolution/Cognitive/Benchmark API
│   ├── app/
│   │   ├── layout.tsx               # Dark theme root layout (Geist font, nav: Marketplace/Models/Evolution/Cognitive/Benchmarks)
│   │   ├── globals.css              # CSS variables matching design spec
│   │   ├── page.tsx                 # Dashboard (Execute/Plan toggle + GoalInput + TaskList)
│   │   ├── tasks/[id]/page.tsx      # Task detail + TaskGraph with live WebSocket
│   │   ├── plans/[id]/page.tsx      # Plan review page (approve/reject)
│   │   ├── marketplace/             # Sprint 5.6: MCP tool marketplace
│   │   │   ├── page.tsx             # Search/browse tools + installed tab
│   │   │   └── components/
│   │   │       ├── SearchBar.tsx    # Debounced search input (400ms)
│   │   │       ├── ToolCard.tsx     # Tool result card (SafetyBadge + InstallButton)
│   │   │       ├── SafetyBadge.tsx  # Color-coded safety tier indicator (4 tiers)
│   │   │       └── InstallButton.tsx # Install/uninstall with confirm dialog
│   │   ├── models/                  # Sprint 5.6: Ollama model management
│   │   │   └── page.tsx             # Pull/delete/switch models + running status
│   │   ├── evolution/               # Sprint 6.5: Evolution dashboard
│   │   │   └── page.tsx             # Stats, failure patterns, preferences, evolution trigger
│   │   ├── cognitive/               # Sprint 7.5: UCL cognitive dashboard
│   │   │   ├── page.tsx             # Shared states + affinity scores (tab UI)
│   │   │   └── components/
│   │   │       ├── StateCard.tsx    # State card (task_id, last_agent, counts)
│   │   │       ├── StateDetail.tsx  # Full state detail (decisions, artifacts, blockers, history)
│   │   │       └── AffinityTable.tsx # Affinity scores table (engine, topic, score)
│   │   └── benchmarks/              # Sprint 7.6: Benchmark dashboard
│   │       └── page.tsx             # Run benchmarks, view continuity + dedup results
│   └── components/
│       ├── GoalInput.tsx            # Textarea + Execute button (Enter to submit)
│       ├── TaskList.tsx             # Task cards with status icons + FREE badge
│       ├── TaskDetail.tsx           # Subtask tree with status dots
│       ├── TaskGraph.tsx            # React Flow DAG visualizer (SubTaskNode + status colors)
│       ├── PlanningView.tsx         # Plan steps table + cost display + approve/reject
│       ├── CostMeter.tsx            # Budget bar + daily/monthly/local stats
│       └── ModelStatus.tsx          # Ollama status dot + model list
│
├── tests/
│   └── unit/
│       ├── domain/
│       │   ├── test_entities.py             # 37 tests (16 behavior + 21 strict validation)
│       │   ├── test_risk_assessor.py        # 19 tests (5 risk tiers + escalation)
│       │   ├── test_approval_engine.py      # 11 tests (3 modes × risk levels)
│       │   ├── test_semantic_fingerprint.py # 11 tests (LSH hash, cosine sim, Sprint 3.1)
│       │   ├── test_forgetting_curve.py   # 14 tests (retention score, expiry, hours_since, Sprint 3.3)
│       │   ├── test_delta_encoder.py     # 34 tests (hash, reconstruct, diff, entity validation, Sprint 3.4)
│       │   ├── test_hierarchical_summarizer.py # 27 tests (tokens, sentences, extract, hierarchy, select, depth, Sprint 3.5)
│       │   ├── test_agent_engine_router.py    # 36 tests (select, fallback, with_fallbacks, engine type, task type, Sprint 4.1)
│       │   ├── test_agent_engine_port.py      # 10 tests (result, capabilities, ABC, backward compat, Sprint 4.1)
│       │   ├── test_tool_candidate.py         # 8 tests (strict validation, defaults, fields, Sprint 5.1)
│       │   ├── test_tool_safety_scorer.py     # 14 tests (trusted publisher, suspicious, tier mapping, Sprint 5.1)
│       │   ├── test_failure_analyzer.py       # 12 tests (error patterns, query extraction, limits, Sprint 5.4)
│       │   ├── test_evolution_vo.py           # EvolutionLevel enum validation (Sprint 6.1)
│       │   ├── test_execution_record.py       # ExecutionRecord strict validation (Sprint 6.1)
│       │   ├── test_strategy_entities.py      # RecoveryRule, ModelPreference, EnginePreference (Sprint 6.3)
│       │   ├── test_tactical_recovery.py      # find_alternative, create_rule, rank (Sprint 6.2)
│       │   ├── test_memory_classifier.py     # 30 tests (classify + classify_with_confidence, Sprint 7.3)
│       │   └── test_conflict_resolver.py     # 25 tests (detect, resolve, resolve_all + helpers, Sprint 7.3)
│       ├── application/
│       │   ├── test_create_task.py      # 5 tests (decompose, save, status, deps)
│       │   ├── test_execute_task.py     # 11 tests (success, fallback, failed, cost + insight extraction, Sprint 7.3)
│       │   ├── test_route_to_engine.py  # 27 tests (list/get/execute happy/fallback/chain + context injection, Sprint 4.3+4.6)
│       │   ├── test_install_tool.py     # 9 tests (search, install, install_by_name, uninstall, list, Sprint 5.3)
│       │   ├── test_discover_tools.py   # 9 tests (suggest, dedup, sort, max_results, context, Sprint 5.4)
│       │   ├── test_manage_ollama.py    # 10 tests (status, pull, delete, switch, info, Sprint 5.5)
│       │   ├── test_analyze_execution.py # AnalyzeExecution (record, stats, failure patterns, Sprint 6.1)
│       │   ├── test_update_strategy.py   # UpdateStrategy (prefs, rules, full update, Sprint 6.3)
│       │   ├── test_systemic_evolution.py # SystemicEvolution (gaps, tools, report, Sprint 6.4)
│       │   └── test_extract_insights.py # 15 tests (extract, store, conflict resolve, task state update, Sprint 7.3)
│       ├── infrastructure/
│       │   ├── test_ollama_manager.py   # 20 tests (health, list, ensure, recommend + delete, info, running, Sprint 5.5)
│       │   ├── test_cost_tracker.py     # 13 tests (record, queries, budget)
│       │   ├── test_litellm_gateway.py  # 24 tests (route, complete, available, O-series temp)
│       │   ├── test_intent_analyzer.py  # 6 tests (decompose, deps, JSON parse)
│       │   ├── test_task_graph_engine.py # 9 tests (parallel, retry, cascade)
│       │   ├── test_local_execution.py  # 35 tests (8 completion criteria)
│       │   ├── test_failure_recovery.py # 7 tests (retry, cascade, partial, persistence)
│       │   ├── test_memory.py           # 36 tests (hierarchy, zipper, knowledge graph)
│       │   ├── test_semantic_search.py  # 20 tests (BucketStore, OllamaAdapter, vector search, Sprint 3.1)
│       │   ├── test_forgetting_curve.py # 17 tests (compact, promote, score, list_by_type, Sprint 3.3)
│       │   ├── test_delta_encoder.py   # 27 tests (record, get_state, history, topics, roundtrip, Sprint 3.4)
│       │   ├── test_hierarchical_summarizer.py # 24 tests (extractive, LLM, get_summary, retrieve_at_depth, Sprint 3.5)
│       │   ├── test_context_bridge.py  # 25 tests (4 platforms, export_all, budget, graceful degradation, Sprint 3.6)
│       │   ├── test_mcp_server.py      # 19 tests (factory, 6 tools, 2 resources, degradation, Sprint 3.7)
│       │   ├── test_mcp_client.py      # 21 tests (port, connect, tools, resources, adapter, discover, Sprint 3.8)
│       │   ├── test_adk_driver.py     # 17 tests (construct, available, run_task, capabilities, Sprint 4.5)
│       │   ├── test_knowledge_loader.py # 13 tests (construction, load, format, missing files, Sprint 4.6)
│       │   ├── test_mcp_registry_client.py # 14 tests (search, parse, error handling, Sprint 5.2)
│       │   ├── test_tool_installer.py  # 11 tests (install, safety gate, uninstall, tracking, Sprint 5.3)
│       │   ├── test_execution_record_repo.py # InMemoryExecutionRecordRepository CRUD (Sprint 6.1)
│       │   ├── test_strategy_store.py  # StrategyStore JSONL I/O (Sprint 6.3)
│       │   ├── test_shared_task_state_repo.py # 16 tests (CRUD, list_active, append, Sprint 7.2)
│       │   ├── test_context_adapters.py # 104 tests (6 adapters × inject/extract, Sprint 7.2)
│       │   └── test_insight_extractor.py # 13 tests (extract, dedup, reclassify, tags, Sprint 7.3)
│       └── interface/
│           ├── test_api.py              # 22 tests (CRUD, WebSocket, CORS, models, cost, memory)
│           ├── test_api_e2e.py          # 12 tests (HTTP round-trip: POST→execute→GET→verify)
│           ├── test_cli.py              # 20 tests (3 foundation + 9 task + 5 model + 3 cost)
│           ├── test_engine_api.py       # 12 tests (list/get/run engines, validation, Sprint 4.3)
│           ├── test_engine_cli.py       # 8 tests (engine list/run, flags, validation, Sprint 4.3)
│           ├── test_marketplace_api.py  # 10 tests (search, install, list, uninstall, validation, Sprint 5.3)
│           ├── test_marketplace_cli.py  # 6 tests (search, install, list, suggest, uninstall, Sprint 5.3)
│           ├── test_evolution_api.py    # Evolution API endpoints (stats, failures, update, evolve, Sprint 6.1)
│           ├── test_evolution_cli.py    # Evolution CLI commands (stats, failures, update, report, Sprint 6.1)
│           ├── test_cognitive_api.py   # 19 tests (state CRUD, affinity, handoff, insights, Sprint 7.5)
│           ├── test_cognitive_cli.py   # 11 tests (state, affinity, insights CLI commands, Sprint 7.5)
│           ├── test_benchmark_api.py  # 7 tests (run/continuity/dedup endpoints, field validation, Sprint 7.6)
│           └── test_benchmark_cli.py  # 4 tests (run/continuity/dedup/help CLI commands, Sprint 7.6)
│   └── integration/
│       ├── test_live_smoke.py           # 10 tests (real Ollama + real filesystem)
│       ├── test_cloud_llm.py            # 11 tests (Anthropic + OpenAI + Gemini + cost + routing)
│       ├── test_e2e_pipeline.py         # 5 tests (Goal → Decompose → DAG → Result)
│       ├── test_memory_hierarchy_full.py # 16 tests (L1-L4 full lifecycle, compression interplay, edge cases, cross-component, Sprint 3.10)
│       ├── test_agent_engines.py        # 37 tests (6 engines live + routing + fallback + availability, Sprint 4.4-4.5)
│       └── test_ucl_cross_engine.py    # 13 tests (handoff pipeline + adapter fidelity + insight roundtrip + affinity + conflict + benchmarks, Sprint 7.6)
│
├── benchmarks/                      # Sprint 7.6: Context continuity + dedup accuracy benchmarks
│   ├── __init__.py
│   ├── context_continuity.py       # AdapterScore, ContinuityResult, run_benchmark() — target >85%
│   ├── dedup_accuracy.py           # DedupScore, DedupResult, run_benchmark() — target >50%
│   └── runner.py                   # BenchmarkSuiteResult, run_all() — unified benchmark runner
│
├── migrations/                      # Alembic async migrations (001 initial + 002 embedding)
├── docker-compose.yml               # PostgreSQL+pgvector, Redis, Neo4j
├── pyproject.toml                   # uv project, ruff, pytest, mypy
└── CLAUDE.md                        # Project constitution

Domain Layer Design Principles

1. Entities are Pure Pydantic

  • ConfigDict(strict=True) — no type coercion
  • ConfigDict(validate_assignment=True) — validates on attribute mutation
  • All status fields use str, Enum value objects (not raw strings)
  • Numeric fields have Field(ge=0) constraints
  • String fields have Field(min_length=1) where empty is invalid

2. Value Objects are Immutable Enums

Value Object Type Values
TaskStatus str, Enum pending, running, success, failed, fallback
SubTaskStatus str, Enum pending, running, success, failed
ObservationStatus str, Enum success, error, denied, timeout
MemoryType str, Enum l1_active, l2_semantic, l3_facts, l4_cold
PlanStatus str, Enum proposed, approved, rejected, executing, completed
RiskLevel IntEnum SAFE(0), LOW(1), MEDIUM(2), HIGH(3), CRITICAL(4)
ApprovalMode str, Enum full-auto, confirm-destructive, confirm-all
ModelTier str, Enum free, low, medium, high
TaskType str, Enum simple_qa, code_generation, ..., long_running_dev, workflow_pipeline
AgentEngineType str, Enum openhands, claude_code, gemini_cli, codex_cli, adk, ollama
SafetyTier IntEnum UNSAFE(0), EXPERIMENTAL(1), COMMUNITY(2), VERIFIED(3)
EvolutionLevel str, Enum tactical, strategic, systemic

3. Ports Define Boundaries

Every external dependency (DB, LLM, filesystem) is accessed through an ABC port:

domain/ports/
├── task_repository.py      # CRUD for tasks
├── llm_gateway.py          # LLM completions
├── local_executor.py       # LAEE tool execution
├── audit_logger.py         # Append-only audit log
├── memory_repository.py    # Semantic memory CRUD
├── cost_repository.py      # Cost tracking queries
├── embedding.py            # Text-to-vector embedding (Sprint 3.1)
├── mcp_client.py           # MCP client connections (Sprint 3.8)
├── agent_engine.py         # Agent execution engine interface (Sprint 4.1)
├── tool_registry.py        # MCP tool registry search (Sprint 5.2)
├── tool_installer.py       # Tool install/uninstall (Sprint 5.3)
├── execution_record_repository.py # Execution record CRUD + stats (Sprint 6.1)
├── shared_task_state_repository.py # Shared task state CRUD + list_active (Sprint 7.1)
├── insight_extractor.py    # Insight extraction from agent output (Sprint 7.1)
├── context_adapter.py      # Bidirectional UCL ↔ engine context translation (Sprint 7.2)
└── memory_classifier.py   # Text → CognitiveMemoryType classification (Sprint 11.2)

4. Services are Pure Functions

Domain services (risk_assessor.py, approval_engine.py, semantic_fingerprint.py, forgetting_curve.py, delta_encoder.py, hierarchical_summarizer.py, agent_engine_router.py, tool_safety_scorer.py, failure_analyzer.py, tactical_recovery.py) have:

  • No constructor dependencies (no injected ports)
  • No I/O operations
  • Deterministic output for given input
  • 100% testable without mocks

Infrastructure Layer Notes

ORM ≠ Domain Entity

Domain entities (domain/entities/) and ORM models (infrastructure/persistence/models.py) are deliberately separate:

Domain Entity ORM Model Why Separate
TaskEntity (Pydantic) TaskModel (SQLAlchemy) Domain stays pure. ORM concerns don't leak into business logic
MemoryEntry (Pydantic) MemoryModel (SQLAlchemy) Different serialization needs
CostRecord (Pydantic) CostLogModel (SQLAlchemy) ORM has DB-specific features

Mapping between domain entities and ORM models happens in repository implementations (infrastructure layer).


Context Bridge & MCP (Phase 3, Week 6)

Cross-Platform Context Bridge (Sprint 3.6)

Exports Morphic-Agent memory/context in platform-specific formats. Infrastructure-only (no domain port).

ContextBridge
├── export(platform, query, max_tokens) → ExportResult
└── export_all(query, max_tokens)       → list[ExportResult]
    │
    ├── _gather_context()  → compose MemoryHierarchy + ContextZipper + DeltaEncoder
    └── _format_{platform}()
        ├── claude_code  → CLAUDE.md markdown (## Context / ## Current State / ## Recent Memory)
        ├── chatgpt      → Custom Instructions (What to know / How to respond)
        ├── cursor        → .cursorrules numbered rules + project context
        └── gemini        → <morphic-context> XML tags with structured sections

All ports optional — graceful degradation when memory/zipper/delta unavailable.

MCP Server (Sprint 3.7)

Exposes Morphic-Agent memory as MCP tools/resources via FastMCP (mcp[cli]>=1.25,<2).

create_mcp_server(container) → FastMCP("morphic-agent")

Tools (6):
├── memory_search    → container.memory.retrieve()
├── memory_add       → container.memory.add()
├── context_compress → container.context_zipper.compress()
├── delta_get_state  → container.delta_encoder.get_state()
├── delta_record     → container.delta_encoder.record()
└── context_export   → container.context_bridge.export()

Resources (2):
├── memory://topics          → delta_encoder.list_topics()
└── memory://state/{topic}   → delta_encoder.get_state(topic)

CLI: morphic mcp server [--transport stdio|streamable-http] [--port 8100]

MCP Client (Sprint 3.8)

Connects to external MCP servers, discovers tools, adapts them for LAEE.

MCPClientPort (domain ABC)
    │
    └── MCPClient (infrastructure)
        ├── connect(name, command, args)  → stdio ClientSession
        ├── list_tools(server)            → tool specs
        ├── call_tool(server, tool, args) → result
        └── disconnect(server)

MCPToolAdapter
    wraps MCP tool as callable with prefix: mcp_{server}_{tool}

discover_and_register(client, configs) → list[MCPToolAdapter]
    batch connect + list tools + create adapters

Testing Strategy

Test Type Location Dependencies Speed What It Tests
Unit/Domain tests/unit/domain/ None ~0.03s Entities, value objects, services, LSH fingerprint, forgetting curve, delta encoder, hierarchical summarizer, agent engine router, tool safety scorer, failure analyzer, tactical recovery
Unit/Application tests/unit/application/ Mocked ports Fast Use case orchestration, cost estimator, planning, engine routing, tool install, tool discovery, Ollama management, analyze execution, update strategy, systemic evolution
Unit/Infra tests/unit/infrastructure/ Mocked ports ~1.2s LLM gateway, cost, task graph, LAEE, memory, PG repos, Celery, browser/gui/cron, semantic search, forgetting curve, delta encoder, hierarchical summarizer, context bridge, MCP server/client, ADK driver, knowledge loader, MCP registry, tool installer, execution record repo, strategy store
Unit/Interface tests/unit/interface/ Mock container ~1.4s API, WebSocket, CORS, CLI commands, plan endpoints, engine endpoints, marketplace endpoints, evolution endpoints
Integration tests/integration/ Ollama running ~18s (37 tests) Real LLM inference, real filesystem, L1-L4 memory hierarchy, agent engine live tests
E2E tests/e2e/ Full stack Slowest API/CLI → Use Case → DB round-trips

TDD Process

1. Red:      Write test that fails
2. Green:    Write minimum code to pass
3. Refactor: Clean up while tests protect

Current: 1162 unit tests + 37 integration tests = 1199 total, 100% pass (~5s)
Lint: ruff check 0 errors, ruff format 245 files clean
Default model: qwen3-coder:30b (thinking mode disabled via extra_body)
Cloud providers verified: Anthropic (Haiku/Sonnet), OpenAI (o4-mini/o3), Gemini (3-flash/3-pro)

Phase 3 Verification (2026-02-26)

✓ Unit tests:       699 passed (3.51s)
✓ Integration tests: 16 passed (0.22s) — L1-L4 full lifecycle
✓ Lint:             ruff check 0 errors, ruff format 169 files clean
✓ FastAPI app:      21 routes (incl. GET /api/memory/export)
✓ AppContainer DI:  MemoryHierarchy, ContextZipper, DeltaEncoderManager,
                    ForgettingCurveManager, ContextBridge, MCPClient — all wired
✓ MCP Server:       6 tools + 1 resource + 1 template registered
✓ CLI:              morphic {task,plan,model,cost,mcp} — 5 subcommands
✓ Context Bridge:   4 platforms output verified (claude_code, chatgpt, cursor, gemini)

Phase 4 Verification — Sprint 4.1–4.3 (2026-03-02)

✓ Unit tests:       864 passed (6.49s)
✓ Lint:             ruff check 0 errors, ruff format 192 files clean
✓ Domain:           AgentEngineType (6), AgentEnginePort ABC, AgentEngineRouter (pure static)
✓ Drivers:          5 implementations (Ollama, ClaudeCode, Codex, Gemini, OpenHands)
✓ Use case:         RouteToEngineUseCase — list/get/execute with fallback chain
✓ API:              GET /api/engines, GET /api/engines/{type}, POST /api/engines/run
✓ CLI:              morphic engine {list, run} — 6 subcommands total
✓ AppContainer DI:  agent_drivers dict + RouteToEngineUseCase wired
✓ New tests:        43 (23 use case + 12 API + 8 CLI)

Phase 4 Verification — Sprint 4.4 Integration Tests (2026-03-03)

✓ Unit tests:       864 passed (3.89s), 0 regressions
✓ Integration tests: 30 total (18 passed, 12 skipped, 0 failed)
✓ Lint:             ruff check 0 errors, ruff format 193 files clean
✓ Test classes:     9 classes covering all 5 engines + routing + fallback
✓ Availability:     is_available() verified for all 5 engines
✓ Disabled drivers: ClaudeCode/Codex disabled=False → proper failure result
✓ Skip pattern:     Graceful skip on env/auth issues (nested session, 401)
✓ Cross-engine:     Same task comparison table with success/duration/cost
✓ Routing:          budget=0→OLLAMA, SIMPLE_QA→OLLAMA, COMPLEX→fallback chain
✓ Fallback:         LONG_RUNNING_DEV→OpenHands or fallback if unavailable

Completion Criteria:
  1. ✓ Same task across multiple engines with result comparison
  2. ✓ Task-type-based automatic engine selection (5 routing tests)
  3. ✓ Availability check + fallback in live environment (fallback chain)

Phase 4 Verification — Sprint 4.5-4.6 (2026-03-05)

✓ Unit tests:       898 passed (3.53s), 0 regressions
✓ Lint:             ruff check 0 errors, ruff format 197 files clean

Sprint 4.5 — ADK Driver:
  ✓ ADKDriver:        google-adk optional dep with try-import guard (_ADK_AVAILABLE)
  ✓ LlmAgent→Runner:  run_async wrapper with InMemorySessionService
  ✓ Capabilities:     2M tokens, parallel=True, mcp=True, streaming=True, $0/hr
  ✓ Container wired:  AgentEngineType.ADK → ADKDriver in AppContainer
  ✓ Config:           adk_enabled, adk_default_model settings
  ✓ Integration:      TestADKEngineLive (3 tests) + availability detection
  ✓ New tests:        17 unit (4 classes: construct, available, run_task, capabilities)

Sprint 4.6 — Knowledge File Management:
  ✓ KnowledgeFileLoader:  engine→file mapping (CLAUDE.md, AGENTS.md, llms-full.txt)
  ✓ Context injection:     RouteToEngineUseCase.execute(context=...) prepends to task
  ✓ API:                   EngineRunRequest.context field → POST /api/engines/run
  ✓ CLI:                   --context / -c flag on morphic engine run
  ✓ Backward compatible:   No ABC changes, no driver signature changes
  ✓ New tests:             13 knowledge loader + 4 context injection = 17

Phase 4 COMPLETE: 6 drivers, router, use case, API/CLI, knowledge injection
  Total new tests: 34 (Sprint 4.5: 17 unit + Sprint 4.6: 17 unit)

Phase 5 Verification — Sprint 5.1–5.6 (2026-03-05)

✓ Unit tests:       988 passed (4.90s), 0 regressions
✓ Lint:             ruff check 0 errors, ruff format 221 files clean

Sprint 5.1 — Tool Safety Scorer:
  ✓ SafetyTier:          IntEnum (UNSAFE=0, EXPERIMENTAL=1, COMMUNITY=2, VERIFIED=3)
  ✓ ToolCandidate:       Pydantic entity (strict=True, validate_assignment=True)
  ✓ ToolSafetyScorer:    score() — publisher trust (0.40) + transport (0.25) + popularity (0.15) + metadata (0.20)
  ✓ Suspicious patterns: Forced UNSAFE for shell_exec, eval, rm -rf patterns
  ✓ New tests:           22 (8 entity + 14 scorer)

Sprint 5.2 — MCP Registry Search:
  ✓ ToolRegistryPort:    ABC + ToolSearchResult dataclass
  ✓ MCPRegistryClient:   httpx GET registry.modelcontextprotocol.io, auto-score results
  ✓ Error handling:      Returns empty ToolSearchResult on HTTP failure (never raises)
  ✓ Response parsing:    Handles both list and dict formats
  ✓ New tests:           14

Sprint 5.3 — Tool Installer + Use Case + API/CLI:
  ✓ ToolInstallerPort:   ABC + InstallResult dataclass
  ✓ MCPToolInstaller:    Safety gate (refuses UNSAFE), subprocess npm/pip, in-memory tracking
  ✓ InstallToolUseCase:  search + score + install coordination
  ✓ API endpoints:       GET /search, POST /install, GET /installed, DELETE /{name}
  ✓ CLI commands:        morphic marketplace {search|install|list|uninstall}
  ✓ Config:              marketplace_enabled, marketplace_auto_install, marketplace_safety_threshold, mcp_registry_url
  ✓ New tests:           36 (11 installer + 9 use case + 10 API + 6 CLI)

Sprint 5.4 — Auto Tool Discoverer:
  ✓ FailureAnalyzer:     Regex error pattern → search query mapping (pure domain)
  ✓ DiscoverToolsUseCase: suggest_for_failure() — top-3 queries, dedup, sort by score
  ✓ API:                 POST /api/marketplace/suggest
  ✓ CLI:                 morphic marketplace suggest
  ✓ New tests:           21 (12 analyzer + 9 use case)

Sprint 5.5 — Ollama Model Manager Extended:
  ✓ OllamaManager:       +delete_model(), +model_info(), +get_running_models()
  ✓ ManageOllamaUseCase: status/pull/delete/info/switch_default
  ✓ API:                 POST /pull, DELETE /{name}, POST /switch, GET /running, GET /{name}/info
  ✓ CLI:                 morphic model {delete|switch|info}
  ✓ New tests:           16 (6 manager + 10 use case)

Sprint 5.6 — Marketplace UI (Next.js):
  ✓ SearchBar:           Debounced search input (400ms)
  ✓ ToolCard:            Search result card (SafetyBadge + InstallButton)
  ✓ SafetyBadge:         Color-coded safety tier (4 tiers)
  ✓ InstallButton:       Install/uninstall with confirm dialog
  ✓ Marketplace page:    Search + browse + installed tab
  ✓ Models page:         Pull/delete/switch + running status display
  ✓ Navigation:          Header links to Marketplace + Models

Sprint 5.7a — Model Management Test Coverage:
  ✓ _MockContainer:    +manage_ollama mock in test_api.py and test_cli.py
  ✓ API tests (9):     pull/delete/switch/info success+failure, running models
  ✓ CLI tests (6):     delete/switch/info success+failure

Sprint 5.7b — Auto-Discovery Trigger on Task Failure:
  ✓ ExecuteTaskUseCase: +discover_tools optional param, +_safe_suggest_tools()
  ✓ Fire-and-forget:   Triggered on FAILED/FALLBACK, never blocks execution
  ✓ Container wiring:  discover_tools=self.discover_tools (1 line)
  ✓ New tests (8):     suggests on failure, not on success, on fallback,
                        failure doesn't block, None safe, combined errors,
                        passes task goal, skipped when no errors

Phase 5 COMPLETE: Safety scoring, MCP registry, tool installer, auto discoverer, Ollama manager, marketplace UI, model mgmt tests, auto-discovery trigger
  Total new tests: ~113 (Sprint 5.1: 22 + 5.2: 14 + 5.3: 36 + 5.4: 21 + 5.5: 16 + 5.7: 23)
  New API endpoints: 8, New CLI commands: 8, New settings: 4

Phase 6 Verification — Sprint 6.1–6.5 (2026-03-05)

✓ Unit tests:       1162 passed (4.82s), 0 failures (MCP server tests fixed)
✓ Lint:             ruff check 0 errors, ruff format 245 files clean

Sprint 6.1 — Execution Recorder + Domain Foundation:
  ✓ ExecutionRecord:   Pydantic entity (strict=True) — task_type, engine, model, success, cost, cache_hit_rate, user_rating
  ✓ EvolutionLevel:    str Enum (tactical, strategic, systemic)
  ✓ ExecutionRecordRepository: ABC + ExecutionStats dataclass (success_rate, avg_cost, total_count)
  ✓ InMemoryExecutionRecordRepository: List-backed CRUD + filtering + stats aggregation
  ✓ AnalyzeExecutionUseCase: record() + get_stats() + get_failure_patterns() + get_model_distribution()

Sprint 6.2 — Tactical Recovery (Level 1):
  ✓ TacticalRecovery:  Pure domain service — find_alternative(), create_rule_from_recovery(), rank_rules()
  ✓ Pattern matching:  Regex-based error→alternative tool mapping
  ✓ RecoveryRule:      error_pattern + alternative_tool + success_rate + sample_size

Sprint 6.3 — Strategy Updater (Level 2):
  ✓ Strategy entities: RecoveryRule, ModelPreference, EnginePreference (all strict Pydantic)
  ✓ StrategyStore:     JSONL persistence (recovery_rules.jsonl, model_preferences.jsonl, engine_preferences.jsonl)
  ✓ UpdateStrategyUseCase: update_model_preferences() + update_engine_preferences() + update_recovery_rules() + run_full_update()
  ✓ Min samples filter: Configurable (default 10) to avoid overfitting

Sprint 6.4 — Systemic Evolver (Level 3):
  ✓ SystemicEvolutionUseCase: identify_tool_gaps() + suggest_tools_for_gaps() + run_evolution()
  ✓ Composes:         AnalyzeExecution + UpdateStrategy + DiscoverTools (Phase 5)
  ✓ EvolutionReport:  level + strategy_update + tool_gaps_found + tools_suggested + summary

Sprint 6.5 — Evolution Interface (API + CLI + UI):
  ✓ API endpoints:    GET /stats, GET /failures, GET /preferences, POST /update, POST /evolve
  ✓ CLI commands:     morphic evolution {stats|failures|update|report}
  ✓ UI page:          Evolution dashboard scaffold (stats, failures, preferences)
  ✓ AppContainer DI:  execution_record_repo + strategy_store + 3 use cases wired
  ✓ Config:           evolution_enabled, evolution_strategy_dir, evolution_min_samples

Phase 6 COMPLETE: 3-tier self-evolution (tactical recovery + strategy learning + systemic evolution)
  Total new tests: ~174 (domain + application + infrastructure + interface)
  New API endpoints: 5, New CLI commands: 4, New settings: 4
  Also fixed: 19 pre-existing test_mcp_server.py failures (_FakeSettings updated)

Phase 7 Verification — Sprint 7.1 (2026-03-10)

✓ Unit tests:       1230 passed (6.18s), 0 failures
✓ New tests:        68 (cognitive domain)
✓ Lint:             ruff check 0 errors, ruff format 251 files clean

Sprint 7.1 — UCL Domain Foundation:
  ✓ CognitiveMemoryType:         str Enum (episodic, semantic, procedural, working)
  ✓ Decision:                    Pydantic entity — description, rationale, agent_engine, confidence [0,1]
  ✓ AgentAction:                 Pydantic entity — agent_engine, action_type (str), cost_usd, duration
  ✓ SharedTaskState:             Pydantic entity — task_id, decisions[], artifacts{}, blockers[], agent_history[]
                                 Methods: add_decision, add_action, add_artifact, add/remove_blocker
                                 Properties: last_agent, total_cost_usd
  ✓ AgentAffinityScore:          Pydantic entity — engine, topic, familiarity/recency/success/cost [0,1]
  ✓ SharedTaskStateRepository:   ABC port (7 methods: save, get, list_active, update_decisions/artifacts, append_action, delete)
  ✓ InsightExtractorPort:        ABC port + ExtractedInsight dataclass
  ✓ AgentAffinityScorer:         Pure static service — score()/rank()/select_best() with 4-factor weighting

Phase 7 Verification — Sprint 7.3 (2026-03-11)

✓ Unit tests:       1438 passed (4.40s), 0 failures
✓ New tests:        88 (30 domain + 15 application + 13 infrastructure + 5 execute_task extension + 25 domain)
✓ Lint:             ruff check 0 errors, ruff format 272 files clean

Sprint 7.3 — Insight Extraction Pipeline:
  ✓ MemoryClassifier:             Pure static service — classify()/classify_with_confidence()
                                  4 regex patterns (PROCEDURAL > SEMANTIC > WORKING > EPISODIC priority)
                                  Confidence: min(0.3 + hits * 0.2, 0.9)
  ✓ ConflictResolver:             Pure static service — detect_conflicts()/resolve()/resolve_all()
                                  3 criteria: different engine + Jaccard overlap ≥ 0.4 + negation contrast
                                  Resolution: higher confidence wins, tie → first (stable)
                                  ConflictPair dataclass for conflict audit trail
  ✓ InsightExtractor:             InsightExtractorPort impl — adapter lookup + dedup + reclassification
                                  Dedup: normalised content (strip+lowercase)
                                  Low confidence (<0.5) → MemoryClassifier override
  ✓ ExtractInsightsUseCase:       Full pipeline: extract → ConflictResolver.resolve_all → MemoryEntry storage → SharedTaskState update
                                  CognitiveMemoryType → MemoryType mapping:
                                    EPISODIC/PROCEDURAL → L2_SEMANTIC, SEMANTIC → L3_FACTS, WORKING → L1_ACTIVE
                                  "decision" tag → state.add_decision(), "artifact"/"file" tag → state.add_artifact()
  ✓ ExecuteTaskUseCase:           +extract_insights optional param, _safe_extract_insights hook (try/except, never blocks)
                                  Gathers subtask results + errors into combined output
  ✓ AppContainer wiring:          6 context adapters → InsightExtractor → ExtractInsightsUseCase → ExecuteTaskUseCase
                                  SharedTaskStateRepository (in-memory) added to container

Phase 7 Verification — Sprint 7.4 (2026-03-11)

✓ Unit tests:       1528 passed (5.0s), 0 failures
✓ New tests:        90 (6 port + 17 domain + 18 infra + 16 route_to_engine + 23 handoff + 10 router)
✓ Lint:             ruff check 0 errors, ruff format 280 files clean

Sprint 7.4 — Affinity-Aware Routing + Task Handoff:
  ✓ AgentAffinityRepository:      New ABC port — 5 methods (get, get_by_topic, get_by_engine, upsert, list_all)
                                  Separate from SharedTaskStateRepository (engine×topic keyed)
  ✓ TopicExtractor:               Pure static service — keyword-based topic extraction
                                  10 topics (frontend/backend/database/devops/testing/security/ml/data/docs/refactoring)
                                  Falls back to "general" if no match
  ✓ select_with_affinity():       New AgentEngineRouter method — affinity-aware reranking
                                  Base chain + AgentAffinityScorer.rank() + boost_threshold gate
                                  OLLAMA always last, budget=0 → [OLLAMA], dedup preserved
  ✓ InMemoryAffinityRepository:   Dict keyed by (engine, topic) tuple
  ✓ JSONLAffinityStore:           JSONL file persistence (lazy-load + full-overwrite on upsert)
  ✓ RouteToEngineUseCase:         Extended with affinity-aware routing + adapter context injection +
                                  post-success affinity updates + action recording to SharedTaskState
                                  All new params optional (None) → full backward compatibility
  ✓ HandoffTaskUseCase:           Cross-agent handoff: load state → record handoff action → add decision →
                                  merge artifacts → adapter inject → execute via RouteToEngine →
                                  record received_handoff → optional insight extraction → persist
  ✓ AppContainer wiring:          InMemoryAffinityRepo + RouteToEngine(adapters, affinity, state) +
                                  HandoffTaskUseCase(route_to_engine, state, adapters, insights)
  ✓ Settings:                     +affinity_min_samples (3) + affinity_boost_threshold (0.6)

Phase 7 Verification — Sprint 7.5 (2026-03-11)

✓ Unit tests:       1558 passed (5.31s), 0 failures
✓ New tests:        30 (19 API + 11 CLI)
✓ Lint:             ruff check 0 errors, ruff format clean

Sprint 7.5 — UCL API + CLI + UI:
  ✓ API schemas:          DecisionResponse, AgentActionResponse, SharedTaskStateResponse (with from_state()),
                          SharedTaskStateListResponse, AffinityScoreResponse (with from_affinity()),
                          AffinityListResponse, HandoffRequestSchema, HandoffResponseSchema (with from_result()),
                          InsightResponse (with from_insight()), InsightListResponse, InsightExtractRequest
  ✓ API routes:           GET /api/cognitive/state (list), GET /api/cognitive/state/{task_id},
                          DELETE /api/cognitive/state/{task_id}, GET /api/cognitive/affinity (?topic, ?engine),
                          POST /api/cognitive/handoff, POST /api/cognitive/insights/extract
  ✓ CLI commands:         morphic cognitive {state|delete|affinity|handoff|insights}
                          state: list all or show by task_id arg
                          affinity: --topic/-t, --engine/-e filters
                          handoff: --task-id, --source, --reason, --target, --budget
                          insights: --task-id, --engine, --output
  ✓ CLI formatters:       print_shared_state() (rich Tree), print_state_list_table() (rich Table),
                          print_affinity_table() (rich Table, color-coded scores)
  ✓ UI page:              /cognitive — tab UI (Shared States + Affinity Scores)
                          StateCard (task_id, last_agent, counts, cost)
                          StateDetail (decisions, artifacts, blockers, agent history)
                          AffinityTable (engine, topic, score, success rate, samples)
  ✓ API client:           getCognitiveStates(), getCognitiveState(), deleteCognitiveState(),
                          getAffinityScores() — TypeScript types + fetch wrappers
  ✓ Navigation:           Header link to Cognitive, version bumped to v0.5.0-alpha
  ✓ Tests:                19 API tests (state CRUD, affinity filter, handoff valid/invalid, insights)
                          11 CLI tests (state list/show, delete, affinity filter, insights invalid)
                          Patch target: interface.cli.commands.cognitive._get_container

Phase 7 Verification — Sprint 7.6 (2026-03-11)

✓ Unit tests:       1569 passed (6.02s), 0 failures
✓ Integration tests: 13 new (50 total), 0 failures
✓ New unit tests:   11 (7 API + 4 CLI)
✓ Lint:             ruff check 0 errors, ruff format clean

Sprint 7.6 — Integration Testing + Benchmarks:
  ✓ Cross-engine integration tests:
      TestFullHandoffPipeline       3 tests (state preservation, insight extraction, chained A→B→C)
      TestContextAdapterFidelity    3 tests (nonempty context, extract insights, roundtrip key info)
      TestInsightExtractionRoundTrip 2 tests (stored in memory, updates task state)
      TestAffinityLearning          2 tests (updates after execution, builds with multiple runs)
      TestConflictResolution        1 test (conflicting insights resolved by confidence)
      TestContextContinuityBenchmark 2 tests (continuity >85%, dedup >50%)
  ✓ Context continuity benchmark:   97.2% overall (target >85%)
      AdapterScore dataclass (engine, decisions/artifacts/blockers injected/found, context_length)
      ContinuityResult (adapter_scores, overall_score property)
      _build_reference_state() — 5 decisions, 4 artifacts, 3 blockers, 3 agent actions
  ✓ Memory dedup benchmark:         57.1% overall (target >50%)
      DedupScore (scenario, engine_a/b, raw counts, deduped_count, dedup_rate)
      DedupResult (scores, overall_accuracy property)
      3 scenarios: overlapping_facts, unique_outputs, case_variation
  ✓ Benchmark runner:               BenchmarkSuiteResult, async run_all()
  ✓ Benchmark API:                  POST /api/benchmarks/{run|continuity|dedup}
      BenchmarkResultResponse schema (overall_score, context_continuity, dedup_accuracy, errors, timestamp)
  ✓ Benchmark CLI:                  morphic benchmark {run|continuity|dedup}
      Rich table output with color-coded scores
  ✓ Benchmark UI:                   /benchmarks page (ScoreBar, adapter table, dedup table)
  ✓ API client:                     runBenchmarks(), runContinuityBenchmark(), runDedupBenchmark()
  ✓ Item 4 (A2A protocol):          Skipped — UCL already provides cross-engine communication

PHASE 7 COMPLETE — All 6 sprints done, all completion criteria met.

Unified Cognitive Layer (UCL) — Phase 7 Architecture

Key Insight: Individual AI agents are cognitive islands. The real power isn't a better model — it's shared cognition across all agents. UCL is the "connective tissue" that makes Morphic-Agent more than the sum of its parts.

Why UCL > Simple A2A

Approach What it shares What it doesn't
A2A Protocol Task requests Memory, decisions, context, artifacts
MCP Tool access Reasoning, context, task state
UCL Memory + Tasks + Decisions + Artifacts + Context

UCL Architecture

┌─────────────────────────────────────────────────────────────────┐
│                   Unified Cognitive Layer (UCL)                  │
│                                                                 │
│  ┌────────────────────┐        ┌────────────────────┐          │
│  │  Shared Task State  │        │  Shared Memory      │          │
│  │  goals, subtasks,   │        │  (4-type hierarchy) │          │
│  │  decisions, artifacts│        │  episodic, semantic, │          │
│  │  blockers, history  │        │  procedural, working│          │
│  └────────────────────┘        └────────────────────┘          │
│                                                                 │
│  ┌────────────────────┐        ┌────────────────────┐          │
│  │  Context Adapters   │        │  Insight Extractor  │          │
│  │  (bidirectional,    │        │  (post-execution,   │          │
│  │   per-engine)       │        │   auto-store)       │          │
│  └────────────────────┘        └────────────────────┘          │
│                                                                 │
│  ┌────────────────────┐        ┌────────────────────┐          │
│  │  Agent Affinity     │        │  Conflict Resolver  │          │
│  │  (context-fit)      │        │  (confidence-weight)│          │
│  └────────────────────┘        └────────────────────┘          │
└─────────────────────────────────────────────────────────────────┘
         ↕              ↕              ↕              ↕
   Claude Code     Gemini CLI     Codex CLI      Ollama ...

New Domain Components (Phase 7)

Entities (domain/entities/cognitive.py):

  • Decision — rationale, agent_engine, confidence, timestamp
  • AgentAction — action_type, summary, cost_usd
  • SharedTaskState — task_id, decisions, artifacts, blockers, agent_history
  • AgentAffinityScore — engine, topic, familiarity, recency, success_rate

Value Objects (domain/value_objects/cognitive.py):

  • CognitiveMemoryType — EPISODIC, SEMANTIC, PROCEDURAL, WORKING

Ports (new ABCs):

  • SharedTaskStateRepository — CRUD for shared task state
  • InsightExtractorPort — extract structured insights from agent output
  • ContextAdapterPort — bidirectional context translation per engine
  • AgentAffinityRepository — engine×topic affinity score persistence (Sprint 7.4)

Domain Services (new):

  • AgentAffinityScorer — context-fit scoring (familiarity×0.4 + recency×0.25 + success×0.2 + cost×0.15)
  • ConflictResolver — confidence-weighted contradiction resolution
  • MemoryClassifier — auto-classify into 4 memory types
  • TopicExtractor — keyword-based topic classification for affinity lookup (Sprint 7.4)

Use Cases (new/modified):

  • ExtractInsightsUseCase — post-execution: extract → conflict check → store → update task state
  • HandoffTaskUseCase — Agent A state capture → context prepare → Agent B delegation (Sprint 7.4)
  • RouteToEngineUseCase — extended with affinity routing + adapter injection + action recording (Sprint 7.4)

Infrastructure (new):

  • InMemoryAgentAffinityRepository — in-memory affinity store (Sprint 7.4)
  • JSONLAffinityStore — JSONL persistence for affinity scores (Sprint 7.4)

Context Adapter Pattern

Each engine speaks a different "context language". Adapters are OS device drivers for AI:

UCL Unified Memory ─── ContextAdapter ─── Engine-specific format
                         │
                    ┌────┴────┐
                    │ inject  │  UCL → engine (pre-execution)
                    │ extract │  engine → UCL (post-execution)
                    └─────────┘
Engine inject() format extract() targets
Claude Code CLAUDE.md + compressed memory Decisions, code changes, reasoning
Codex CLI AGENTS.md + task context Code output, test results
Gemini CLI Full context (2M window) Research findings, analysis
ADK Workflow state Pipeline outputs, errors
OpenHands REST task + sandbox state Artifacts, test results
Ollama ContextZipper compressed Draft outputs

Builds on Phase 3 Foundations

Existing (Phase 3) UCL Role
SemanticFingerprint (LSH) Dedup across agents (same concept = same hash)
ContextZipper Per-engine compression (token budget varies)
ContextBridge Evolves into Context Adapters
ForgettingCurve UCL memory lifecycle management
DeltaEncoder Shared state change tracking
HierarchicalSummarizer Multi-level views for different context windows
L1-L4 Memory Hierarchy 4 memory types map to existing layers