apps/cli: two-phase AI build workflow with blockify skill by youknowriad · Pull Request #3207 · Automattic/studio

youknowriad · 2026-04-23T13:20:49Z

Related issues

Related to: none (iterative refinement of the AI agent's site-build flow, supersedes the system-prompt portion of apps/cli: generate style.css and page content section-by-section #3199)

How AI was used in this PR

The system-prompt changes and the blockify skill in this PR were designed and written by Claude together with the PR author, using an iterative loop of prompt edit → full build session → session-JSONL audit → next edit. Each change cites the behavior it is responding to from a specific session recording. Reviewers should focus on whether the prompt wording is clear and consistent, whether the workflow described is the one we want the agent to follow, and whether the new blockify skill is cleanly scoped.

Proposed Changes

Local-site build: two-phase workflow

The local-site workflow in the AI agent's system prompt is now split into two explicit phases:

PHASE 1 — HTML prototype. The agent writes plain HTML/CSS/JS under <site>/tmp/prototype/ using a section-anchor skeleton and fills one anchor per Edit. Design tokens are locked in a tokens anchor before any section fill. Writes under wp-content/themes/ and wp-content/plugins/ are forbidden in this phase. Phase 1 completes on a take_screenshot of the prototype.
PHASE 2 — Port to block theme. The agent invokes the new blockify skill as a gate, builds the block-theme skeleton, cp-s the prototype stylesheet into <theme>/assets/css/main.css and adjusts only block-DOM selectors (.wp-block-button, .wp-block-image) via small Edits, translates each prototype section to block markup in <site>/tmp/page-<slug>.html, and applies the content via wp_cli eval '... file_get_contents(ABSPATH . "tmp/page-<slug>.html") ...'.

The eval-based application replaces the earlier --post_content-file=<host path> pattern, which silently failed because wp_cli runs inside the PHP-WASM filesystem and cannot read host paths. The host site directory is mounted at /wordpress/ in WASM, so ABSPATH . "tmp/..." resolves correctly.

Remote-site (WordPress.com) build: three-phase workflow

The remote-site workflow was restructured to mirror the local two-phase pattern while keeping an explicit audit step that's only meaningful on remote (plan gating + site discovery):

PHASE 1 — Audit. Plan gate (GET /), content audit, active theme, templates, and fetching the global-styles ID for later CSS work.
PHASE 2 — HTML prototype. Local scratch at ~/.studio/tmp/prototype-<site-slug>/ (the CLI runs locally; only the WordPress.com site is remote). Explicit FORBIDDEN list blocks any wpcom_request POST/PUT/DELETE, plugin install, theme switch, or global-styles edit until the prototype is screenshot-approved — mirrors the local phase-gate enforcement.
PHASE 3 — Port to WordPress.com. Invoke blockify → translate prototype sections to block markup in a local scratch file (so re-sends work on transient POST failures) → apply content via wpcom_request POST /posts / POST /posts/<id> → apply CSS via POST /global-styles/<id> with settings.custom (paid plans only, free plans refuse) → apply template changes via POST /templates/<id> / POST /template-parts/<id> → screenshot verify.

Previously, the remote prompt had a broken <site>/tmp/prototype/ reference (no such local path in a remote-only session), no blockify invocation, and pulled in the shared WORK_CADENCE constant whose rules (theme cp, ABSPATH eval, wp_cli post_content) don't apply to remote. The remote branch now has its own inline cadence: one content-producing wpcom_request per turn, GETs combinable, anti-screenshot-serialization.

New `blockify` skill

A new user-invokable skill at apps/cli/ai/plugin/skills/blockify/SKILL.md provides the HTML → Gutenberg block translation table, per-element block patterns, and decompose rules. It is intentionally scoped to pure conversion — it does not enumerate site content, does not rewrite CSS, and operates on any HTML input (a file, a snippet, fetched post_content). This is a stripped and repurposed version of the skill proposed in #3016; the site-wide audit, CSS migration, and phase-1/phase-2 orchestration concerns from that PR are moved to the system prompt or dropped.

The skill is referenced at the start of PHASE 2 (local) / PHASE 3 (remote) so the agent loads its rules into context before emitting block markup.

Working cadence rules

WORK_CADENCE (local) distinguishes content creation (one tool per turn — avoids the silent 20 KB-in-one-Write generation cliff) from fix-up loops (no re-validation or re-screenshot after every individual Edit — avoids the Edit → validate → Edit → validate serialization anti-pattern). The validate_blocks tool description in the prompt is updated in lockstep so it no longer instructs the agent to validate after every individual Edit — prior sessions showed the agent literally executing Edit → validate → Edit → validate × 10 because the tool description instructed it to "call after every file write/edit."

Skeleton-and-fill patterns are explicitly listed for prototype stylesheets, prototype HTML pages, and phase-2 block-markup page content. Anchor names are required to be composition-specific, not templated (hero/features/cta).

wp_cli shell-syntax prohibition restored

Earlier refactoring had dropped the explicit "wp_cli does NOT accept shell syntax" line. A recent session hit a 130 s silent hang when the agent ran wp post get 4 --field=post_content | grep -o ... — the pipe is shell syntax wp_cli can't execute. The rule is restored, with the concrete failing command listed as an anti-pattern and wp_cli eval pointed to as the PHP-side alternative for filtering.

Editor styles

A general rule was added requiring functions.php to register every enqueued frontend stylesheet as an editor style too (add_theme_support( 'editor-styles' ) + add_editor_style( ... )). Without this, the block editor renders unstyled content and diverges from the frontend — a regression surfaced in an earlier session.

maxTurns 50 → 100

startAiAgent's default maxTurns is raised to 100. Section-by-section cadence adds turns by design; recent full builds completed at 102–135 assistant turns, so 100 gives headroom without being visibly different on runs that complete quickly.

Expected impact (measured against session recordings)

Theme stylesheet: previous runs burned 60–90 s on a single Write of main.css. Post-change runs use cp + a handful of ≤2 KB Edits — no silent generation.
Page content apply: previous runs spent 18+ turns debugging --post_content-file=<host path> producing empty content. Post-change runs apply on the first try via the ABSPATH eval pattern.
Fix-up loops: previous runs serialized Edit → validate → Edit → validate × 10. Post-change runs do Edit × N → validate once — 11 turns for 10 fixes instead of 20.
Total session time dropped ~25 % between pre- and post-change runs of comparable prompts (~15 min → ~11.5 min).
Max per-turn gap dropped from ~76 s (silent main.css write) to ~30 s (productive validate_blocks processing).
Design fidelity: the prototype acts as the screenshot-validated reference; the theme inherits it verbatim and only adjusts for block-DOM selectors, reducing drift.

Not in this PR

No code-level change to the wp_cli wrapper (an ergonomic extension of rewrite-wp-cli-post-content.ts to also intercept --post_content-file=<host path> was considered; the prompt-level fix was chosen as the lower-risk option).
No model, thinking-config, or streaming changes.
Remote workflow has not yet been tested end-to-end against a real WordPress.com site — only structural cleanup and alignment with local. Verification of the POST /global-styles/<id> settings.custom path in particular would be valuable.

Testing Instructions

Local build

Build the CLI: npm run cli:build.
Run the AI agent on a fresh build prompt: node apps/cli/dist/cli/main.mjs ai and ask it to build a landing page (e.g. "Build a farm one page site, known for its animals visits and its local products. Use elegant colors and design.").
In the tool-use timeline you should see:
- Skill: studio:site-spec → site_create → Bash: mkdir tmp/prototype → prototype style.css + index.html written as small skeletons, then filled anchor-by-anchor.
- take_screenshot on file:///.../tmp/prototype/index.html.
- Skill: studio:blockify invoked before any block markup is written.
- cp <site>/tmp/prototype/style.css <site>/wp-content/themes/<slug>/assets/css/main.css via Bash — NOT a large Write of main.css.
- Block theme skeleton written: theme.json, functions.php, parts/header.html, parts/footer.html, templates/index.html, optionally templates/front-page.html.
- <site>/tmp/page-home.html created and filled section-by-section (NOT inside the theme folder).
- Apply step: wp_cli eval '$content = file_get_contents(ABSPATH . "tmp/page-home.html"); wp_update_post([...]); echo "ok";' — NOT --post_content-file=<host path>.
- wp_cli option update show_on_front page + wp_cli option update page_on_front <id> (two separate calls, not shell-chained with &&).
- take_screenshot http://localhost:<port> desktop + mobile; CSS polish edits if needed.
In the fix-up loop after validate_blocks or the final screenshot, the agent should apply multiple Edits consecutively, then re-validate/re-screenshot once — NOT Edit → validate → Edit → validate.
Confirm the final site looks correct, no sections were wrapped in  (check Document Overview in the block editor), and the block editor shows styled content matching the frontend (editor styles registered correctly).

Remote build

Not yet exercised end-to-end. When testing:

Select a WordPress.com site on a paid plan (free plan should refuse most requests per the plan gate).
Ask for a build or redesign.
Verify the tool-use timeline shows: GET / plan check → audit GETs → Bash: mkdir -p ~/.studio/tmp/prototype-<slug>/ → prototype Write/Edits → take_screenshot file:///.../prototype-<slug>/index.html → Skill: studio:blockify → per-page content via wpcom_request POST /posts or POST /posts/<id> → POST /global-styles/<id> for CSS → screenshot verify against the remote URL.
No wpcom_request POST/PUT/DELETE should fire before the prototype screenshot.
In particular, verify the POST /global-styles/<id> payload shape — the prompt recommends { settings: { custom: "<CSS>" }, styles: {} } but this hasn't been exercised against a live WP.com site.

Session recordings under ~/Library/Application Support/Studio/sessions/ (macOS) are the fastest way to verify the tool sequence matches the expectations above.

Pre-merge Checklist

Have you checked for TypeScript, React or other console errors? (npm run typecheck)
Tests pass? (npm test -- apps/cli/ai)
Manual verification on at least one local AI agent build run (see Testing Instructions)
Manual verification on at least one remote AI agent build run (see Testing Instructions — in particular, the Global Styles CSS path)

🤖 Generated with Claude Code

Restructures the AI agent's site-build flow into explicit phases and adds a user-invokable skill for HTML → Gutenberg block conversion, measured against a sequence of full-build sessions to validate each change. Phase 1 — HTML prototype. The agent writes plain HTML/CSS/JS under `<site>/tmp/prototype/` with a section-anchor skeleton, then fills one anchor per Edit. Design tokens are locked in a `tokens` anchor before any section fill. Theme writes are forbidden in this phase so the design is screenshot-approved before block markup enters the picture. Phase 2 — Port to block theme. The agent invokes the new `blockify` skill (apps/cli/ai/plugin/skills/blockify/SKILL.md) as a gate before writing block markup. The theme stylesheet is `cp`-ed from the prototype and adjusted via small Edits for block-DOM selectors (`.wp-block-button`, `.wp-block-image`), replacing a prior 60–90s silent regeneration. Page content is built in `<site>/tmp/page-<slug>.html` and applied via `wp_cli eval '... file_get_contents(ABSPATH . "tmp/page-<slug>.html") ...'` — the prior `--post_content-file=<host path>` pattern silently failed because wp_cli runs in a WASM filesystem that cannot see host paths. Working cadence is split into content creation (one tool per turn to avoid silent generation cliffs) and fix-up loops (multiple Edits per turn when validate_blocks or take_screenshot report multiple issues). The validate_blocks tool description is updated in lockstep so it no longer instructs the agent to validate after every individual Edit. `maxTurns` default is raised to 100 to give headroom for the added section-by-section turns without truncating builds.

…o-phase style The remote-site system prompt had a handful of inherited problems from earlier iterations: PHASE 2's HTML prototype told the agent to write to `<site>/tmp/...` which doesn't exist on a remote-only session, the `${WORK_CADENCE}` include leaked local-only rules (theme `cp`, ABSPATH eval) into a context that never touches a theme file, and PHASE 3 had no `blockify` invocation before block markup was produced. Restructured workflow while keeping the 3-phase split that's natural for remote (Audit / Prototype / Port): - PHASE 1 Audit: plan gate + discovery + fetch global-styles ID. - PHASE 2 Prototype: HTML/CSS/JS written to `~/.studio/tmp/prototype-<slug>/` on the local CLI machine, with an explicit FORBIDDEN list that bars any remote write (`wpcom_request` POST/PUT, plugin install, theme switch, global-styles edit) until the prototype screenshot is approved. - PHASE 3 Port: invoke `blockify` first, translate to block markup in a local scratch file (so re-sends work if a POST fails), apply content via `wpcom_request POST /posts`, CSS via `POST /global-styles/<id>` with `settings.custom` (paid plans only), template changes via `POST /templates/<id>`, screenshot verify. Replaced the shared `${WORK_CADENCE}` include with an inline remote cadence: one content-producing `wpcom_request` per turn, GETs combinable, local Write/Edit cadence during phase 2, anti-screenshot-serialization rule. Narrowed the IMPORTANT line about tool restrictions: `Bash`/`Write`/`Edit` are allowed for prototype scratch files, forbidden for the remote site itself (only wpcom_request can change the site).

…is-bbce01 # Conflicts: # apps/cli/ai/agent.ts

…zation Addresses three recurring regressions observed in recent build sessions (double button padding/borders, content centered too narrow, missing section padding). All three share a root cause: WordPress block DOM and theme.json defaults inject paint that fights prototype CSS when it is copied verbatim to the theme. Blockify skill: - New "CSS migration after conversion" section with concrete rules for buttons, images, groups, and padding. - Button rule is the highest-leverage: the `.wp-block-button` wrapper gets ZERO paint; all paint (background, border, padding, color, hover) goes on `.wp-block-button.<className> .wp-block-button__link`. Prevents the classic double-border artifact caused by `className` landing on the wrapper while `wp-element-button` defaults still paint the inner. - Image rule splits figure-level vs img-level selectors. - Group/section rule explains `is-layout-constrained` × `contentSize` and how to match the prototype's max-width. - Padding rule keeps section padding on the className (CSS), not on block `style.spacing.padding` attributes, and surfaces theme.json defaults as the other common culprit. System prompt PHASE 2 step 1: - `theme.json` now MUST set `settings.layout.contentSize`/`wideSize` from the prototype's actual max-widths, and `styles.elements.button` to neutralize `wp-element-button`'s default paint. This makes the prototype CSS the only source of truth for visual styling. - Block-DOM adjustments list now mirrors the blockify skill's CSS migration rules, with a pointer to the skill for context.

wpmobilebot · 2026-04-23T14:20:27Z

📊 Performance Test Results

Comparing 3ec3dd8 vs trunk

app-size

Metric	trunk	`3ec3dd8`	Diff	Change
App Size (Mac)	1482.75 MB	1482.76 MB	+0.01 MB	⚪ 0.0%

site-editor

Metric	trunk	`3ec3dd8`	Diff	Change
load	1852 ms	1870 ms	+18 ms	⚪ 0.0%

site-startup

Metric	trunk	`3ec3dd8`	Diff	Change
siteCreation	8077 ms	8083 ms	+6 ms	⚪ 0.0%
siteStartup	4956 ms	4946 ms	10 ms	⚪ 0.0%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

sejas

I tested this PR. It worked for more than 15 minutes after it reached 100 turns, even though I don’t see the turn limit has increased in the code. After another ~10 minutes, it finished the site. The site isn’t fully working, with many empty sections. It only worked on the home page. I could have kept going, but I feel trunk produced better results with the same amount of effort.

https://antoniosejas-wdvxm-studio.wp.build/

It's also worth mentioning it started producing html which seems to be the goal, but it migrated HTML comments that appear in the editor.

I also noticed that it uses Python. I hadn’t noticed that before, but I could be wrong.

annezazu · 2026-04-23T16:16:16Z

Testing this and it definitely feels more "active" than before and fixes the long lags. I used it to redesign a current site it already built and it feels snappier in the process whereas before there were long lags of nothing (no messages, no sense of what it was doing, etc).

epeicher

Thanks @youknowriad! I have tested it, and I have found the following:

Site specs worked fine, after one prompt it built the site
The quality of the site is comparable to similar prompts in old versions of trunk for my test case, I have found a quality improvement compared to the current trunk. I have found the double border in buttons that I tried to overcome with prompts like this one
The time it took is slightly longer than trunk, but not a huge difference, it took ~19mins while previous tests in trunk took ~15mins
The animations are back as they were in trunk before #3199, they are nice 👍
The logs show that the maximum time spent in the server was ~30 seconds, well below the timeout of 240 ✅
No prompt about maximum number of turns reached 👍
The core html blocks are not perfect, the blockify skill in this PR seems to improve but not everything is editable

In my testing, I see an improvement from current trunk as we're fixing timeouts and improving quality compared to #3199

youknowriad · 2026-04-23T18:45:03Z

Haha, funny how your experience @epeicher is different than @sejas. To be honest, I'm not sure what to do :) It's very hard to say which is better, which is worse.

youknowriad · 2026-04-24T20:01:47Z

Closing this PR for now, we need a better solution I think, maybe we can retry this later.

youknowriad added 5 commits April 23, 2026 14:19

Merge remote-tracking branch 'origin/trunk' into claude/objective-ell…

83c8c4d

…is-bbce01 # Conflicts: # apps/cli/ai/agent.ts

apps/cli: fix prettier spacing in template literal

3ec3dd8

youknowriad mentioned this pull request Apr 23, 2026

Add prompt-rule regression tests to the agent eval suite #3210

Merged

4 tasks

sejas reviewed Apr 23, 2026

View reviewed changes

epeicher reviewed Apr 23, 2026

View reviewed changes

wojtekn assigned youknowriad Apr 24, 2026

youknowriad closed this Apr 24, 2026

youknowriad deleted the claude/objective-ellis-bbce01 branch April 24, 2026 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apps/cli: two-phase AI build workflow with blockify skill#3207

apps/cli: two-phase AI build workflow with blockify skill#3207
youknowriad wants to merge 5 commits into
trunkfrom
claude/objective-ellis-bbce01

youknowriad commented Apr 23, 2026 •

edited

Loading

Uh oh!

wpmobilebot commented Apr 23, 2026

Uh oh!

sejas left a comment •

edited

Loading

Uh oh!

annezazu commented Apr 23, 2026

Uh oh!

epeicher left a comment •

edited

Loading

Uh oh!

youknowriad commented Apr 23, 2026

Uh oh!

youknowriad commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

youknowriad commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related issues

How AI was used in this PR

Proposed Changes

Local-site build: two-phase workflow

Remote-site (WordPress.com) build: three-phase workflow

New blockify skill

Working cadence rules

wp_cli shell-syntax prohibition restored

Editor styles

maxTurns 50 → 100

Expected impact (measured against session recordings)

Not in this PR

Testing Instructions

Local build

Remote build

Pre-merge Checklist

Uh oh!

wpmobilebot commented Apr 23, 2026

📊 Performance Test Results

app-size

site-editor

site-startup

Uh oh!

sejas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

annezazu commented Apr 23, 2026

Uh oh!

epeicher left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youknowriad commented Apr 23, 2026

Uh oh!

youknowriad commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

youknowriad commented Apr 23, 2026 •

edited

Loading

New `blockify` skill

sejas left a comment •

edited

Loading

epeicher left a comment •

edited

Loading