Add catalog/IPC baselines and harden documented catalog parity#23
Closed
osamu2001 wants to merge 18 commits into
Closed
Add catalog/IPC baselines and harden documented catalog parity#23osamu2001 wants to merge 18 commits into
osamu2001 wants to merge 18 commits into
Conversation
intent(benchmarks): capture before values for list_projects list_tags and the IPC hot path so later optimizations have a comparison baseline decision(benchmarks): extend the baseline with benchmark-list-projects benchmark-list-tags and benchmark-bridge-health without changing the existing suite decision(catalog-cache): use cacheTTL=0 during benchmarks so measurements reflect the query path instead of catalog cache hits rejected(list-projects-parity): skipped the JXA parity baseline because it currently fails to decode and landed this as a plugin-only contract baseline constraint(cli-contract): kept the normal MCP read path and existing benchmark output contract unchanged by making the new commands and gates additive learned(ipc-baseline): separating bridge timing from end-to-end latency makes transport overhead tails much easier to see
intent(benchmark-gate): list-projects and list-tags smoke gates should fail on payload drift rather than stable bad responses decision(benchmark-gate): compare bridge catalog pages against JXA parity baselines and scope jxa_probe to all-only rejected(benchmark-gate): bridge-vs-bridge contract checks because consistently wrong payloads still pass constraint(bridge-validation): plugin install timed out against the sandbox plugin directory so local live gate acceptance remains environment-blocked learned(omni-automation): project and tag parity baselines needed full page metadata plus stalled and count field mapping before they were trustworthy
intent(jxa-parity): benchmark-gate acceptance should pass under Codex instead of stalling on local JXA authorization and type-conversion failures decision(jxa-parity): retry -1743 failures through osascript and run list-projects/list-tags parity scripts inside OmniFocus via evaluateJavascript rejected(jxa-parity): direct JXA document queries for catalog parity because task.taskStatus cannot be converted reliably in that context constraint(benchmark-gate): keep public CLI and MCP contracts unchanged while restoring install-plugin, restart, swift test, and benchmark-gate acceptance learned(omni-automation): the same catalog parity logic is stable once it runs in Omni Automation, but direct JXA and OSAKit have different failure modes in this environment
intent(project-stalled): list-projects parity should not silently mark non-empty projects as stalled when Omni Automation cannot provide stalled metadata decision(project-stalled): fail the JXA stalled scenario explicitly when nextTask or containsSingletonActions are unavailable instead of coercing them through safe boolean fallbacks rejected(project-stalled): defaulting unsupported stalled inputs to true or false because either guess corrupts gate results and benchmark baselines constraint(benchmark-gate): keep the active_counts_stalled scenario while making unsupported field access surface as a parity failure learned(omni-automation): stalled parity needs a supported-fields probe separate from ordinary nullable nextTask values
intent(tag-status): list-tags parity should not silently mix on-hold or dropped tags into the active scope when tag status is unavailable decision(tag-status): treat missing or unrecognized Omni Automation tag status as an explicit unsupported failure instead of defaulting to active rejected(tag-status): coercing unknown status to active because it hides runtime limitations and corrupts the active filter baseline constraint(benchmark-gate): preserve the existing active tag scenarios while making unsupported status access surface clearly learned(omni-automation): tag status parity is only trustworthy when the runtime yields a concrete status value or enum mapping
intent(tag-counts): list-tags parity should not report zero tasks when Omni Automation cannot provide tag count collections decision(tag-counts): surface missing availableTasks, remainingTasks, or tasks as explicit unsupported failures instead of converting null collections into empty arrays rejected(tag-counts): zero-filling unsupported count inputs because it creates false parity failures and misleading benchmark baselines constraint(benchmark-gate): keep the active_with_counts scenario while making unsupported tag count access obvious learned(omni-automation): tag count parity is only meaningful after every convenience collection resolves to a concrete collection
intent(benchmarks): replace the stale smoke baselines added on this branch with measurements from the current bridge and gate state decision(artifacts): keep only summary.md tracked for benchmark evidence and drop raw.jsonl from the commit scope constraint(validation): smoke artifacts were regenerated only after swift test, plugin reinstall, OmniFocus restart, and list-projects/list-tags/all semantic gates passed
intent(benchmarks): capture merge-confidence baselines for catalog queries and bridge IPC in addition to the refreshed smoke artifacts decision(artifacts): record realistic measurements in separate dated directories so smoke and realistic evidence stay comparable without overwriting each other constraint(benchmarks): these runs use the documented realistic profile settings with summary.md tracked and raw.jsonl excluded
intent(progress-doc): replace the outdated catalog baseline writeup with one that matches the latest smoke and realistic measurements on this branch decision(reporting): move the report to a new 2026-03-20 file so the document date matches the captured artifacts and branch state constraint(references): remove all references to the superseded 2026-03-16 artifact paths and keep transport claims out of this report
intent(catalog-contract): list_tags を documented surface だけで成立させる decision(tag-counts): convenience pools ではなく taskStatus 集計で count を導出する constraint(tag-pagination): flattenedTags 不在でも nested tags を totalCount に含める
intent(catalog-parity): bridge と jxa の tag 集計モデルを一致させる decision(tag-enumeration): root tags と children の完全走査を fallback に使う
intent(project-health): unsupported field を parity 成功に見せない constraint(jxa-projects): undefined は unsupported、null nextTask は正当な stalled 値
intent(gates): benchmark-gate-check を unsupported surface の有無から切り離す rejected(project-health): nextTask と containsSingletonActions を gate の正解に使う方針
intent(contract-docs): 実装と review checklist の不一致を解消する learned(catalog-contract): gate は documented surface に限定しないと誤陽性と誤陰性を両方生む
intent(tag-fallback): flattenedTags 非対応時でも root tags から fallback 列挙を成立させる constraint(tag-fallback): local tagItems と OmniFocus の global tags を分離して TDZ を避ける
intent(osascript-fallback): keep the Apple Event recovery path usable for large list_tasks and list_projects responses decision(pipe-drain): start stdout and stderr readers before waitUntilExit so osascript cannot block on a full pipe buffer constraint(automation-errors): preserve the existing AutomationError surface while making the drain order testable
intent(list-projects): omitted fields calls must keep the legacy default payload instead of probing unsupported project extras decision(project-fields): gate nextTask, containsSingletonActions, hasChildren, and isStalled behind explicit field requests while preserving includeTaskCounts constraint(project-query): keep the completedBefore exclusive filter fix in the same script path while restoring omitted-fields compatibility learned(script-capture): the JXA source assertions need to tolerate escaped inner script strings when validating generated automation code
intent(benchmark): failed benchmark calls must honor cooldown even when the bridge fails without a timeout decision(cooldown): make the shared cooldown helper unconditional for failure paths because its callers already sit on error branches decision(bridge-health): treat ok:false health responses as degraded failures so unhealthy plugin probes back off like thrown bridge errors learned(benchmark-tests): a small async cooldown assertion plus source guards catches both helper regressions and the bridge-health ok:false path
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation