Skip to content

Add catalog/IPC baselines and harden documented catalog parity#23

Closed
osamu2001 wants to merge 18 commits into
deverman:masterfrom
osamu2001:chore/catalog-ipc-baselines
Closed

Add catalog/IPC baselines and harden documented catalog parity#23
osamu2001 wants to merge 18 commits into
deverman:masterfrom
osamu2001:chore/catalog-ipc-baselines

Conversation

@osamu2001
Copy link
Copy Markdown
Contributor

Summary

  • add catalog and IPC benchmark commands plus gate coverage for list-projects, list-tags, and bridge health
  • align project/tag parity logic with documented Omni Automation surfaces and fail unsupported fields explicitly instead of guessing
  • fix the osascript fallback pipe drain order, preserve default listProjects payloads, and restore cooldown handling for degraded benchmark failures
  • refresh smoke/realistic benchmark summaries and the catalog progress documentation

Validation

  • swift test
  • benchmark-gate-check --tool list-projects
  • benchmark-gate-check --tool list-tags
  • benchmark-gate-check --tool all

intent(benchmarks): capture before values for list_projects list_tags and the IPC hot path so later optimizations have a comparison baseline

decision(benchmarks): extend the baseline with benchmark-list-projects benchmark-list-tags and benchmark-bridge-health without changing the existing suite
decision(catalog-cache): use cacheTTL=0 during benchmarks so measurements reflect the query path instead of catalog cache hits
rejected(list-projects-parity): skipped the JXA parity baseline because it currently fails to decode and landed this as a plugin-only contract baseline
constraint(cli-contract): kept the normal MCP read path and existing benchmark output contract unchanged by making the new commands and gates additive
learned(ipc-baseline): separating bridge timing from end-to-end latency makes transport overhead tails much easier to see
intent(benchmark-gate): list-projects and list-tags smoke gates should fail on payload drift rather than stable bad responses

decision(benchmark-gate): compare bridge catalog pages against JXA parity baselines and scope jxa_probe to all-only

rejected(benchmark-gate): bridge-vs-bridge contract checks because consistently wrong payloads still pass

constraint(bridge-validation): plugin install timed out against the sandbox plugin directory so local live gate acceptance remains environment-blocked

learned(omni-automation): project and tag parity baselines needed full page metadata plus stalled and count field mapping before they were trustworthy
intent(jxa-parity): benchmark-gate acceptance should pass under Codex instead of stalling on local JXA authorization and type-conversion failures

decision(jxa-parity): retry -1743 failures through osascript and run list-projects/list-tags parity scripts inside OmniFocus via evaluateJavascript

rejected(jxa-parity): direct JXA document queries for catalog parity because task.taskStatus cannot be converted reliably in that context

constraint(benchmark-gate): keep public CLI and MCP contracts unchanged while restoring install-plugin, restart, swift test, and benchmark-gate acceptance

learned(omni-automation): the same catalog parity logic is stable once it runs in Omni Automation, but direct JXA and OSAKit have different failure modes in this environment
intent(project-stalled): list-projects parity should not silently mark non-empty projects as stalled when Omni Automation cannot provide stalled metadata

decision(project-stalled): fail the JXA stalled scenario explicitly when nextTask or containsSingletonActions are unavailable instead of coercing them through safe boolean fallbacks

rejected(project-stalled): defaulting unsupported stalled inputs to true or false because either guess corrupts gate results and benchmark baselines

constraint(benchmark-gate): keep the active_counts_stalled scenario while making unsupported field access surface as a parity failure

learned(omni-automation): stalled parity needs a supported-fields probe separate from ordinary nullable nextTask values
intent(tag-status): list-tags parity should not silently mix on-hold or dropped tags into the active scope when tag status is unavailable

decision(tag-status): treat missing or unrecognized Omni Automation tag status as an explicit unsupported failure instead of defaulting to active

rejected(tag-status): coercing unknown status to active because it hides runtime limitations and corrupts the active filter baseline

constraint(benchmark-gate): preserve the existing active tag scenarios while making unsupported status access surface clearly

learned(omni-automation): tag status parity is only trustworthy when the runtime yields a concrete status value or enum mapping
intent(tag-counts): list-tags parity should not report zero tasks when Omni Automation cannot provide tag count collections

decision(tag-counts): surface missing availableTasks, remainingTasks, or tasks as explicit unsupported failures instead of converting null collections into empty arrays

rejected(tag-counts): zero-filling unsupported count inputs because it creates false parity failures and misleading benchmark baselines

constraint(benchmark-gate): keep the active_with_counts scenario while making unsupported tag count access obvious

learned(omni-automation): tag count parity is only meaningful after every convenience collection resolves to a concrete collection
intent(benchmarks): replace the stale smoke baselines added on this branch with measurements from the current bridge and gate state
decision(artifacts): keep only summary.md tracked for benchmark evidence and drop raw.jsonl from the commit scope
constraint(validation): smoke artifacts were regenerated only after swift test, plugin reinstall, OmniFocus restart, and list-projects/list-tags/all semantic gates passed
intent(benchmarks): capture merge-confidence baselines for catalog queries and bridge IPC in addition to the refreshed smoke artifacts
decision(artifacts): record realistic measurements in separate dated directories so smoke and realistic evidence stay comparable without overwriting each other
constraint(benchmarks): these runs use the documented realistic profile settings with summary.md tracked and raw.jsonl excluded
intent(progress-doc): replace the outdated catalog baseline writeup with one that matches the latest smoke and realistic measurements on this branch
decision(reporting): move the report to a new 2026-03-20 file so the document date matches the captured artifacts and branch state
constraint(references): remove all references to the superseded 2026-03-16 artifact paths and keep transport claims out of this report
intent(catalog-contract): list_tags を documented surface だけで成立させる
decision(tag-counts): convenience pools ではなく taskStatus 集計で count を導出する
constraint(tag-pagination): flattenedTags 不在でも nested tags を totalCount に含める
intent(catalog-parity): bridge と jxa の tag 集計モデルを一致させる
decision(tag-enumeration): root tags と children の完全走査を fallback に使う
intent(project-health): unsupported field を parity 成功に見せない
constraint(jxa-projects): undefined は unsupported、null nextTask は正当な stalled 値
intent(gates): benchmark-gate-check を unsupported surface の有無から切り離す
rejected(project-health): nextTask と containsSingletonActions を gate の正解に使う方針
intent(contract-docs): 実装と review checklist の不一致を解消する
learned(catalog-contract): gate は documented surface に限定しないと誤陽性と誤陰性を両方生む
intent(tag-fallback): flattenedTags 非対応時でも root tags から fallback 列挙を成立させる
constraint(tag-fallback): local tagItems と OmniFocus の global tags を分離して TDZ を避ける
intent(osascript-fallback): keep the Apple Event recovery path usable for large list_tasks and list_projects responses
decision(pipe-drain): start stdout and stderr readers before waitUntilExit so osascript cannot block on a full pipe buffer
constraint(automation-errors): preserve the existing AutomationError surface while making the drain order testable
intent(list-projects): omitted fields calls must keep the legacy default payload instead of probing unsupported project extras
decision(project-fields): gate nextTask, containsSingletonActions, hasChildren, and isStalled behind explicit field requests while preserving includeTaskCounts
constraint(project-query): keep the completedBefore exclusive filter fix in the same script path while restoring omitted-fields compatibility
learned(script-capture): the JXA source assertions need to tolerate escaped inner script strings when validating generated automation code
intent(benchmark): failed benchmark calls must honor cooldown even when the bridge fails without a timeout
decision(cooldown): make the shared cooldown helper unconditional for failure paths because its callers already sit on error branches
decision(bridge-health): treat ok:false health responses as degraded failures so unhealthy plugin probes back off like thrown bridge errors
learned(benchmark-tests): a small async cooldown assertion plus source guards catches both helper regressions and the bridge-health ok:false path
@osamu2001 osamu2001 closed this Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant