apps/cli: two-phase AI build workflow with blockify skill#3207
apps/cli: two-phase AI build workflow with blockify skill#3207youknowriad wants to merge 5 commits into
Conversation
Restructures the AI agent's site-build flow into explicit phases and adds a user-invokable skill for HTML → Gutenberg block conversion, measured against a sequence of full-build sessions to validate each change. Phase 1 — HTML prototype. The agent writes plain HTML/CSS/JS under `<site>/tmp/prototype/` with a section-anchor skeleton, then fills one anchor per Edit. Design tokens are locked in a `tokens` anchor before any section fill. Theme writes are forbidden in this phase so the design is screenshot-approved before block markup enters the picture. Phase 2 — Port to block theme. The agent invokes the new `blockify` skill (apps/cli/ai/plugin/skills/blockify/SKILL.md) as a gate before writing block markup. The theme stylesheet is `cp`-ed from the prototype and adjusted via small Edits for block-DOM selectors (`.wp-block-button`, `.wp-block-image`), replacing a prior 60–90s silent regeneration. Page content is built in `<site>/tmp/page-<slug>.html` and applied via `wp_cli eval '... file_get_contents(ABSPATH . "tmp/page-<slug>.html") ...'` — the prior `--post_content-file=<host path>` pattern silently failed because wp_cli runs in a WASM filesystem that cannot see host paths. Working cadence is split into content creation (one tool per turn to avoid silent generation cliffs) and fix-up loops (multiple Edits per turn when validate_blocks or take_screenshot report multiple issues). The validate_blocks tool description is updated in lockstep so it no longer instructs the agent to validate after every individual Edit. `maxTurns` default is raised to 100 to give headroom for the added section-by-section turns without truncating builds.
…o-phase style
The remote-site system prompt had a handful of inherited problems from earlier
iterations: PHASE 2's HTML prototype told the agent to write to `<site>/tmp/...`
which doesn't exist on a remote-only session, the `${WORK_CADENCE}` include
leaked local-only rules (theme `cp`, ABSPATH eval) into a context that never
touches a theme file, and PHASE 3 had no `blockify` invocation before block
markup was produced.
Restructured workflow while keeping the 3-phase split that's natural for
remote (Audit / Prototype / Port):
- PHASE 1 Audit: plan gate + discovery + fetch global-styles ID.
- PHASE 2 Prototype: HTML/CSS/JS written to `~/.studio/tmp/prototype-<slug>/`
on the local CLI machine, with an explicit FORBIDDEN list that bars any
remote write (`wpcom_request` POST/PUT, plugin install, theme switch,
global-styles edit) until the prototype screenshot is approved.
- PHASE 3 Port: invoke `blockify` first, translate to block markup in a
local scratch file (so re-sends work if a POST fails), apply content via
`wpcom_request POST /posts`, CSS via `POST /global-styles/<id>` with
`settings.custom` (paid plans only), template changes via
`POST /templates/<id>`, screenshot verify.
Replaced the shared `${WORK_CADENCE}` include with an inline remote cadence:
one content-producing `wpcom_request` per turn, GETs combinable, local
Write/Edit cadence during phase 2, anti-screenshot-serialization rule.
Narrowed the IMPORTANT line about tool restrictions: `Bash`/`Write`/`Edit`
are allowed for prototype scratch files, forbidden for the remote site
itself (only wpcom_request can change the site).
…is-bbce01 # Conflicts: # apps/cli/ai/agent.ts
…zation Addresses three recurring regressions observed in recent build sessions (double button padding/borders, content centered too narrow, missing section padding). All three share a root cause: WordPress block DOM and theme.json defaults inject paint that fights prototype CSS when it is copied verbatim to the theme. Blockify skill: - New "CSS migration after conversion" section with concrete rules for buttons, images, groups, and padding. - Button rule is the highest-leverage: the `.wp-block-button` wrapper gets ZERO paint; all paint (background, border, padding, color, hover) goes on `.wp-block-button.<className> .wp-block-button__link`. Prevents the classic double-border artifact caused by `className` landing on the wrapper while `wp-element-button` defaults still paint the inner. - Image rule splits figure-level vs img-level selectors. - Group/section rule explains `is-layout-constrained` × `contentSize` and how to match the prototype's max-width. - Padding rule keeps section padding on the className (CSS), not on block `style.spacing.padding` attributes, and surfaces theme.json defaults as the other common culprit. System prompt PHASE 2 step 1: - `theme.json` now MUST set `settings.layout.contentSize`/`wideSize` from the prototype's actual max-widths, and `styles.elements.button` to neutralize `wp-element-button`'s default paint. This makes the prototype CSS the only source of truth for visual styling. - Block-DOM adjustments list now mirrors the blockify skill's CSS migration rules, with a pointer to the skill for context.
📊 Performance Test ResultsComparing 3ec3dd8 vs trunk app-size
site-editor
site-startup
Results are median values from multiple test runs. Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff) |
There was a problem hiding this comment.
I tested this PR. It worked for more than 15 minutes after it reached 100 turns, even though I don’t see the turn limit has increased in the code. After another ~10 minutes, it finished the site. The site isn’t fully working, with many empty sections. It only worked on the home page. I could have kept going, but I feel trunk produced better results with the same amount of effort.
https://antoniosejas-wdvxm-studio.wp.build/
It's also worth mentioning it started producing html which seems to be the goal, but it migrated HTML comments that appear in the editor.
I also noticed that it uses Python. I hadn’t noticed that before, but I could be wrong.
|
Testing this and it definitely feels more "active" than before and fixes the long lags. I used it to redesign a current site it already built and it feels snappier in the process whereas before there were long lags of nothing (no messages, no sense of what it was doing, etc). |
There was a problem hiding this comment.
Thanks @youknowriad! I have tested it, and I have found the following:
- Site specs worked fine, after one prompt it built the site
- The quality of the site is comparable to similar prompts in old versions of trunk for my test case, I have found a quality improvement compared to the current trunk. I have found the double border in buttons that I tried to overcome with prompts like this one

- The time it took is slightly longer than trunk, but not a huge difference, it took ~19mins while previous tests in trunk took ~15mins
- The animations are back as they were in
trunkbefore #3199, they are nice 👍 - The logs show that the maximum time spent in the server was ~30 seconds, well below the timeout of 240 ✅
- No prompt about maximum number of turns reached 👍
- The core html blocks are not perfect, the blockify skill in this PR seems to improve but not everything is editable

In my testing, I see an improvement from current trunk as we're fixing timeouts and improving quality compared to #3199
|
Closing this PR for now, we need a better solution I think, maybe we can retry this later. |
Related issues
How AI was used in this PR
The system-prompt changes and the
blockifyskill in this PR were designed and written by Claude together with the PR author, using an iterative loop of prompt edit → full build session → session-JSONL audit → next edit. Each change cites the behavior it is responding to from a specific session recording. Reviewers should focus on whether the prompt wording is clear and consistent, whether the workflow described is the one we want the agent to follow, and whether the newblockifyskill is cleanly scoped.Proposed Changes
Local-site build: two-phase workflow
The local-site workflow in the AI agent's system prompt is now split into two explicit phases:
<site>/tmp/prototype/using a section-anchor skeleton and fills one anchor perEdit. Design tokens are locked in atokensanchor before any section fill. Writes underwp-content/themes/andwp-content/plugins/are forbidden in this phase. Phase 1 completes on atake_screenshotof the prototype.blockifyskill as a gate, builds the block-theme skeleton,cp-s the prototype stylesheet into<theme>/assets/css/main.cssand adjusts only block-DOM selectors (.wp-block-button,.wp-block-image) via smallEdits, translates each prototype section to block markup in<site>/tmp/page-<slug>.html, and applies the content viawp_cli eval '... file_get_contents(ABSPATH . "tmp/page-<slug>.html") ...'.The eval-based application replaces the earlier
--post_content-file=<host path>pattern, which silently failed because wp_cli runs inside the PHP-WASM filesystem and cannot read host paths. The host site directory is mounted at/wordpress/in WASM, soABSPATH . "tmp/..."resolves correctly.Remote-site (WordPress.com) build: three-phase workflow
The remote-site workflow was restructured to mirror the local two-phase pattern while keeping an explicit audit step that's only meaningful on remote (plan gating + site discovery):
GET /), content audit, active theme, templates, and fetching the global-styles ID for later CSS work.~/.studio/tmp/prototype-<site-slug>/(the CLI runs locally; only the WordPress.com site is remote). Explicit FORBIDDEN list blocks anywpcom_requestPOST/PUT/DELETE, plugin install, theme switch, or global-styles edit until the prototype is screenshot-approved — mirrors the local phase-gate enforcement.blockify→ translate prototype sections to block markup in a local scratch file (so re-sends work on transient POST failures) → apply content viawpcom_request POST /posts/POST /posts/<id>→ apply CSS viaPOST /global-styles/<id>withsettings.custom(paid plans only, free plans refuse) → apply template changes viaPOST /templates/<id>/POST /template-parts/<id>→ screenshot verify.Previously, the remote prompt had a broken
<site>/tmp/prototype/reference (no such local path in a remote-only session), noblockifyinvocation, and pulled in the sharedWORK_CADENCEconstant whose rules (themecp, ABSPATH eval,wp_cli post_content) don't apply to remote. The remote branch now has its own inline cadence: one content-producingwpcom_requestper turn, GETs combinable, anti-screenshot-serialization.New
blockifyskillA new user-invokable skill at
apps/cli/ai/plugin/skills/blockify/SKILL.mdprovides the HTML → Gutenberg block translation table, per-element block patterns, and decompose rules. It is intentionally scoped to pure conversion — it does not enumerate site content, does not rewrite CSS, and operates on any HTML input (a file, a snippet, fetchedpost_content). This is a stripped and repurposed version of the skill proposed in #3016; the site-wide audit, CSS migration, and phase-1/phase-2 orchestration concerns from that PR are moved to the system prompt or dropped.The skill is referenced at the start of PHASE 2 (local) / PHASE 3 (remote) so the agent loads its rules into context before emitting block markup.
Working cadence rules
WORK_CADENCE(local) distinguishes content creation (one tool per turn — avoids the silent 20 KB-in-one-Write generation cliff) from fix-up loops (no re-validation or re-screenshot after every individual Edit — avoids theEdit → validate → Edit → validateserialization anti-pattern). Thevalidate_blockstool description in the prompt is updated in lockstep so it no longer instructs the agent to validate after every individualEdit— prior sessions showed the agent literally executingEdit → validate → Edit → validate× 10 because the tool description instructed it to "call after every file write/edit."Skeleton-and-fill patterns are explicitly listed for prototype stylesheets, prototype HTML pages, and phase-2 block-markup page content. Anchor names are required to be composition-specific, not templated (
hero/features/cta).wp_cli shell-syntax prohibition restored
Earlier refactoring had dropped the explicit "wp_cli does NOT accept shell syntax" line. A recent session hit a 130 s silent hang when the agent ran
wp post get 4 --field=post_content | grep -o ...— the pipe is shell syntax wp_cli can't execute. The rule is restored, with the concrete failing command listed as an anti-pattern andwp_cli evalpointed to as the PHP-side alternative for filtering.Editor styles
A general rule was added requiring
functions.phpto register every enqueued frontend stylesheet as an editor style too (add_theme_support( 'editor-styles' )+add_editor_style( ... )). Without this, the block editor renders unstyled content and diverges from the frontend — a regression surfaced in an earlier session.maxTurns 50 → 100
startAiAgent's defaultmaxTurnsis raised to 100. Section-by-section cadence adds turns by design; recent full builds completed at 102–135 assistant turns, so 100 gives headroom without being visibly different on runs that complete quickly.Expected impact (measured against session recordings)
Writeofmain.css. Post-change runs usecp+ a handful of ≤2 KB Edits — no silent generation.--post_content-file=<host path>producing empty content. Post-change runs apply on the first try via the ABSPATH eval pattern.Edit → validate → Edit → validate× 10. Post-change runs doEdit × N → validateonce — 11 turns for 10 fixes instead of 20.main.csswrite) to ~30 s (productivevalidate_blocksprocessing).Not in this PR
rewrite-wp-cli-post-content.tsto also intercept--post_content-file=<host path>was considered; the prompt-level fix was chosen as the lower-risk option).POST /global-styles/<id>settings.custompath in particular would be valuable.Testing Instructions
Local build
npm run cli:build.node apps/cli/dist/cli/main.mjs aiand ask it to build a landing page (e.g. "Build a farm one page site, known for its animals visits and its local products. Use elegant colors and design.").Skill: studio:site-spec→site_create→Bash: mkdir tmp/prototype→ prototypestyle.css+index.htmlwritten as small skeletons, then filled anchor-by-anchor.take_screenshotonfile:///.../tmp/prototype/index.html.Skill: studio:blockifyinvoked before any block markup is written.cp <site>/tmp/prototype/style.css <site>/wp-content/themes/<slug>/assets/css/main.cssvia Bash — NOT a largeWriteofmain.css.theme.json,functions.php,parts/header.html,parts/footer.html,templates/index.html, optionallytemplates/front-page.html.<site>/tmp/page-home.htmlcreated and filled section-by-section (NOT inside the theme folder).wp_cli eval '$content = file_get_contents(ABSPATH . "tmp/page-home.html"); wp_update_post([...]); echo "ok";'— NOT--post_content-file=<host path>.wp_cli option update show_on_front page+wp_cli option update page_on_front <id>(two separate calls, not shell-chained with&&).take_screenshot http://localhost:<port>desktop + mobile; CSS polish edits if needed.validate_blocksor the final screenshot, the agent should apply multipleEdits consecutively, then re-validate/re-screenshot once — NOTEdit → validate → Edit → validate.<!-- wp:html -->(check Document Overview in the block editor), and the block editor shows styled content matching the frontend (editor styles registered correctly).Remote build
Not yet exercised end-to-end. When testing:
GET /plan check → audit GETs →Bash: mkdir -p ~/.studio/tmp/prototype-<slug>/→ prototype Write/Edits →take_screenshot file:///.../prototype-<slug>/index.html→Skill: studio:blockify→ per-page content viawpcom_request POST /postsorPOST /posts/<id>→POST /global-styles/<id>for CSS → screenshot verify against the remote URL.wpcom_request POST/PUT/DELETEshould fire before the prototype screenshot.POST /global-styles/<id>payload shape — the prompt recommends{ settings: { custom: "<CSS>" }, styles: {} }but this hasn't been exercised against a live WP.com site.Session recordings under
~/Library/Application Support/Studio/sessions/(macOS) are the fastest way to verify the tool sequence matches the expectations above.Pre-merge Checklist
npm run typecheck)npm test -- apps/cli/ai)🤖 Generated with Claude Code