Skip to content

apps/cli: two-phase AI build workflow with blockify skill#3207

Closed
youknowriad wants to merge 5 commits into
trunkfrom
claude/objective-ellis-bbce01
Closed

apps/cli: two-phase AI build workflow with blockify skill#3207
youknowriad wants to merge 5 commits into
trunkfrom
claude/objective-ellis-bbce01

Conversation

@youknowriad
Copy link
Copy Markdown
Contributor

@youknowriad youknowriad commented Apr 23, 2026

Related issues

How AI was used in this PR

The system-prompt changes and the blockify skill in this PR were designed and written by Claude together with the PR author, using an iterative loop of prompt edit → full build session → session-JSONL audit → next edit. Each change cites the behavior it is responding to from a specific session recording. Reviewers should focus on whether the prompt wording is clear and consistent, whether the workflow described is the one we want the agent to follow, and whether the new blockify skill is cleanly scoped.

Proposed Changes

Local-site build: two-phase workflow

The local-site workflow in the AI agent's system prompt is now split into two explicit phases:

  • PHASE 1 — HTML prototype. The agent writes plain HTML/CSS/JS under <site>/tmp/prototype/ using a section-anchor skeleton and fills one anchor per Edit. Design tokens are locked in a tokens anchor before any section fill. Writes under wp-content/themes/ and wp-content/plugins/ are forbidden in this phase. Phase 1 completes on a take_screenshot of the prototype.
  • PHASE 2 — Port to block theme. The agent invokes the new blockify skill as a gate, builds the block-theme skeleton, cp-s the prototype stylesheet into <theme>/assets/css/main.css and adjusts only block-DOM selectors (.wp-block-button, .wp-block-image) via small Edits, translates each prototype section to block markup in <site>/tmp/page-<slug>.html, and applies the content via wp_cli eval '... file_get_contents(ABSPATH . "tmp/page-<slug>.html") ...'.

The eval-based application replaces the earlier --post_content-file=<host path> pattern, which silently failed because wp_cli runs inside the PHP-WASM filesystem and cannot read host paths. The host site directory is mounted at /wordpress/ in WASM, so ABSPATH . "tmp/..." resolves correctly.

Remote-site (WordPress.com) build: three-phase workflow

The remote-site workflow was restructured to mirror the local two-phase pattern while keeping an explicit audit step that's only meaningful on remote (plan gating + site discovery):

  • PHASE 1 — Audit. Plan gate (GET /), content audit, active theme, templates, and fetching the global-styles ID for later CSS work.
  • PHASE 2 — HTML prototype. Local scratch at ~/.studio/tmp/prototype-<site-slug>/ (the CLI runs locally; only the WordPress.com site is remote). Explicit FORBIDDEN list blocks any wpcom_request POST/PUT/DELETE, plugin install, theme switch, or global-styles edit until the prototype is screenshot-approved — mirrors the local phase-gate enforcement.
  • PHASE 3 — Port to WordPress.com. Invoke blockify → translate prototype sections to block markup in a local scratch file (so re-sends work on transient POST failures) → apply content via wpcom_request POST /posts / POST /posts/<id> → apply CSS via POST /global-styles/<id> with settings.custom (paid plans only, free plans refuse) → apply template changes via POST /templates/<id> / POST /template-parts/<id> → screenshot verify.

Previously, the remote prompt had a broken <site>/tmp/prototype/ reference (no such local path in a remote-only session), no blockify invocation, and pulled in the shared WORK_CADENCE constant whose rules (theme cp, ABSPATH eval, wp_cli post_content) don't apply to remote. The remote branch now has its own inline cadence: one content-producing wpcom_request per turn, GETs combinable, anti-screenshot-serialization.

New blockify skill

A new user-invokable skill at apps/cli/ai/plugin/skills/blockify/SKILL.md provides the HTML → Gutenberg block translation table, per-element block patterns, and decompose rules. It is intentionally scoped to pure conversion — it does not enumerate site content, does not rewrite CSS, and operates on any HTML input (a file, a snippet, fetched post_content). This is a stripped and repurposed version of the skill proposed in #3016; the site-wide audit, CSS migration, and phase-1/phase-2 orchestration concerns from that PR are moved to the system prompt or dropped.

The skill is referenced at the start of PHASE 2 (local) / PHASE 3 (remote) so the agent loads its rules into context before emitting block markup.

Working cadence rules

WORK_CADENCE (local) distinguishes content creation (one tool per turn — avoids the silent 20 KB-in-one-Write generation cliff) from fix-up loops (no re-validation or re-screenshot after every individual Edit — avoids the Edit → validate → Edit → validate serialization anti-pattern). The validate_blocks tool description in the prompt is updated in lockstep so it no longer instructs the agent to validate after every individual Edit — prior sessions showed the agent literally executing Edit → validate → Edit → validate × 10 because the tool description instructed it to "call after every file write/edit."

Skeleton-and-fill patterns are explicitly listed for prototype stylesheets, prototype HTML pages, and phase-2 block-markup page content. Anchor names are required to be composition-specific, not templated (hero/features/cta).

wp_cli shell-syntax prohibition restored

Earlier refactoring had dropped the explicit "wp_cli does NOT accept shell syntax" line. A recent session hit a 130 s silent hang when the agent ran wp post get 4 --field=post_content | grep -o ... — the pipe is shell syntax wp_cli can't execute. The rule is restored, with the concrete failing command listed as an anti-pattern and wp_cli eval pointed to as the PHP-side alternative for filtering.

Editor styles

A general rule was added requiring functions.php to register every enqueued frontend stylesheet as an editor style too (add_theme_support( 'editor-styles' ) + add_editor_style( ... )). Without this, the block editor renders unstyled content and diverges from the frontend — a regression surfaced in an earlier session.

maxTurns 50 → 100

startAiAgent's default maxTurns is raised to 100. Section-by-section cadence adds turns by design; recent full builds completed at 102–135 assistant turns, so 100 gives headroom without being visibly different on runs that complete quickly.

Expected impact (measured against session recordings)

  • Theme stylesheet: previous runs burned 60–90 s on a single Write of main.css. Post-change runs use cp + a handful of ≤2 KB Edits — no silent generation.
  • Page content apply: previous runs spent 18+ turns debugging --post_content-file=<host path> producing empty content. Post-change runs apply on the first try via the ABSPATH eval pattern.
  • Fix-up loops: previous runs serialized Edit → validate → Edit → validate × 10. Post-change runs do Edit × N → validate once — 11 turns for 10 fixes instead of 20.
  • Total session time dropped ~25 % between pre- and post-change runs of comparable prompts (~15 min → ~11.5 min).
  • Max per-turn gap dropped from ~76 s (silent main.css write) to ~30 s (productive validate_blocks processing).
  • Design fidelity: the prototype acts as the screenshot-validated reference; the theme inherits it verbatim and only adjusts for block-DOM selectors, reducing drift.

Not in this PR

  • No code-level change to the wp_cli wrapper (an ergonomic extension of rewrite-wp-cli-post-content.ts to also intercept --post_content-file=<host path> was considered; the prompt-level fix was chosen as the lower-risk option).
  • No model, thinking-config, or streaming changes.
  • Remote workflow has not yet been tested end-to-end against a real WordPress.com site — only structural cleanup and alignment with local. Verification of the POST /global-styles/<id> settings.custom path in particular would be valuable.

Testing Instructions

Local build

  1. Build the CLI: npm run cli:build.
  2. Run the AI agent on a fresh build prompt: node apps/cli/dist/cli/main.mjs ai and ask it to build a landing page (e.g. "Build a farm one page site, known for its animals visits and its local products. Use elegant colors and design.").
  3. In the tool-use timeline you should see:
    • Skill: studio:site-specsite_createBash: mkdir tmp/prototype → prototype style.css + index.html written as small skeletons, then filled anchor-by-anchor.
    • take_screenshot on file:///.../tmp/prototype/index.html.
    • Skill: studio:blockify invoked before any block markup is written.
    • cp <site>/tmp/prototype/style.css <site>/wp-content/themes/<slug>/assets/css/main.css via Bash — NOT a large Write of main.css.
    • Block theme skeleton written: theme.json, functions.php, parts/header.html, parts/footer.html, templates/index.html, optionally templates/front-page.html.
    • <site>/tmp/page-home.html created and filled section-by-section (NOT inside the theme folder).
    • Apply step: wp_cli eval '$content = file_get_contents(ABSPATH . "tmp/page-home.html"); wp_update_post([...]); echo "ok";' — NOT --post_content-file=<host path>.
    • wp_cli option update show_on_front page + wp_cli option update page_on_front <id> (two separate calls, not shell-chained with &&).
    • take_screenshot http://localhost:<port> desktop + mobile; CSS polish edits if needed.
  4. In the fix-up loop after validate_blocks or the final screenshot, the agent should apply multiple Edits consecutively, then re-validate/re-screenshot once — NOT Edit → validate → Edit → validate.
  5. Confirm the final site looks correct, no sections were wrapped in <!-- wp:html --> (check Document Overview in the block editor), and the block editor shows styled content matching the frontend (editor styles registered correctly).

Remote build

Not yet exercised end-to-end. When testing:

  1. Select a WordPress.com site on a paid plan (free plan should refuse most requests per the plan gate).
  2. Ask for a build or redesign.
  3. Verify the tool-use timeline shows: GET / plan check → audit GETs → Bash: mkdir -p ~/.studio/tmp/prototype-<slug>/ → prototype Write/Edits → take_screenshot file:///.../prototype-<slug>/index.htmlSkill: studio:blockify → per-page content via wpcom_request POST /posts or POST /posts/<id>POST /global-styles/<id> for CSS → screenshot verify against the remote URL.
  4. No wpcom_request POST/PUT/DELETE should fire before the prototype screenshot.
  5. In particular, verify the POST /global-styles/<id> payload shape — the prompt recommends { settings: { custom: "<CSS>" }, styles: {} } but this hasn't been exercised against a live WP.com site.

Session recordings under ~/Library/Application Support/Studio/sessions/ (macOS) are the fastest way to verify the tool sequence matches the expectations above.

Pre-merge Checklist

  • Have you checked for TypeScript, React or other console errors? (npm run typecheck)
  • Tests pass? (npm test -- apps/cli/ai)
  • Manual verification on at least one local AI agent build run (see Testing Instructions)
  • Manual verification on at least one remote AI agent build run (see Testing Instructions — in particular, the Global Styles CSS path)

🤖 Generated with Claude Code

Restructures the AI agent's site-build flow into explicit phases and
adds a user-invokable skill for HTML → Gutenberg block conversion,
measured against a sequence of full-build sessions to validate each
change.

Phase 1 — HTML prototype. The agent writes plain HTML/CSS/JS under
`<site>/tmp/prototype/` with a section-anchor skeleton, then fills
one anchor per Edit. Design tokens are locked in a `tokens` anchor
before any section fill. Theme writes are forbidden in this phase so
the design is screenshot-approved before block markup enters the
picture.

Phase 2 — Port to block theme. The agent invokes the new `blockify`
skill (apps/cli/ai/plugin/skills/blockify/SKILL.md) as a gate before
writing block markup. The theme stylesheet is `cp`-ed from the
prototype and adjusted via small Edits for block-DOM selectors
(`.wp-block-button`, `.wp-block-image`), replacing a prior 60–90s
silent regeneration. Page content is built in
`<site>/tmp/page-<slug>.html` and applied via
`wp_cli eval '... file_get_contents(ABSPATH . "tmp/page-<slug>.html") ...'`
— the prior `--post_content-file=<host path>` pattern silently
failed because wp_cli runs in a WASM filesystem that cannot see
host paths.

Working cadence is split into content creation (one tool per turn to
avoid silent generation cliffs) and fix-up loops (multiple Edits per
turn when validate_blocks or take_screenshot report multiple issues).
The validate_blocks tool description is updated in lockstep so it no
longer instructs the agent to validate after every individual Edit.

`maxTurns` default is raised to 100 to give headroom for the added
section-by-section turns without truncating builds.
…o-phase style

The remote-site system prompt had a handful of inherited problems from earlier
iterations: PHASE 2's HTML prototype told the agent to write to `<site>/tmp/...`
which doesn't exist on a remote-only session, the `${WORK_CADENCE}` include
leaked local-only rules (theme `cp`, ABSPATH eval) into a context that never
touches a theme file, and PHASE 3 had no `blockify` invocation before block
markup was produced.

Restructured workflow while keeping the 3-phase split that's natural for
remote (Audit / Prototype / Port):

- PHASE 1 Audit: plan gate + discovery + fetch global-styles ID.
- PHASE 2 Prototype: HTML/CSS/JS written to `~/.studio/tmp/prototype-<slug>/`
  on the local CLI machine, with an explicit FORBIDDEN list that bars any
  remote write (`wpcom_request` POST/PUT, plugin install, theme switch,
  global-styles edit) until the prototype screenshot is approved.
- PHASE 3 Port: invoke `blockify` first, translate to block markup in a
  local scratch file (so re-sends work if a POST fails), apply content via
  `wpcom_request POST /posts`, CSS via `POST /global-styles/<id>` with
  `settings.custom` (paid plans only), template changes via
  `POST /templates/<id>`, screenshot verify.

Replaced the shared `${WORK_CADENCE}` include with an inline remote cadence:
one content-producing `wpcom_request` per turn, GETs combinable, local
Write/Edit cadence during phase 2, anti-screenshot-serialization rule.

Narrowed the IMPORTANT line about tool restrictions: `Bash`/`Write`/`Edit`
are allowed for prototype scratch files, forbidden for the remote site
itself (only wpcom_request can change the site).
…is-bbce01

# Conflicts:
#	apps/cli/ai/agent.ts
…zation

Addresses three recurring regressions observed in recent build sessions
(double button padding/borders, content centered too narrow, missing
section padding). All three share a root cause: WordPress block DOM and
theme.json defaults inject paint that fights prototype CSS when it is
copied verbatim to the theme.

Blockify skill:
- New "CSS migration after conversion" section with concrete rules for
  buttons, images, groups, and padding.
- Button rule is the highest-leverage: the `.wp-block-button` wrapper
  gets ZERO paint; all paint (background, border, padding, color, hover)
  goes on `.wp-block-button.<className> .wp-block-button__link`. Prevents
  the classic double-border artifact caused by `className` landing on
  the wrapper while `wp-element-button` defaults still paint the inner.
- Image rule splits figure-level vs img-level selectors.
- Group/section rule explains `is-layout-constrained` × `contentSize`
  and how to match the prototype's max-width.
- Padding rule keeps section padding on the className (CSS), not on
  block `style.spacing.padding` attributes, and surfaces theme.json
  defaults as the other common culprit.

System prompt PHASE 2 step 1:
- `theme.json` now MUST set `settings.layout.contentSize`/`wideSize`
  from the prototype's actual max-widths, and `styles.elements.button`
  to neutralize `wp-element-button`'s default paint. This makes the
  prototype CSS the only source of truth for visual styling.
- Block-DOM adjustments list now mirrors the blockify skill's CSS
  migration rules, with a pointer to the skill for context.
@wpmobilebot
Copy link
Copy Markdown
Collaborator

📊 Performance Test Results

Comparing 3ec3dd8 vs trunk

app-size

Metric trunk 3ec3dd8 Diff Change
App Size (Mac) 1482.75 MB 1482.76 MB +0.01 MB ⚪ 0.0%

site-editor

Metric trunk 3ec3dd8 Diff Change
load 1852 ms 1870 ms +18 ms ⚪ 0.0%

site-startup

Metric trunk 3ec3dd8 Diff Change
siteCreation 8077 ms 8083 ms +6 ms ⚪ 0.0%
siteStartup 4956 ms 4946 ms 10 ms ⚪ 0.0%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

Copy link
Copy Markdown
Member

@sejas sejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR. It worked for more than 15 minutes after it reached 100 turns, even though I don’t see the turn limit has increased in the code. After another ~10 minutes, it finished the site. The site isn’t fully working, with many empty sections. It only worked on the home page. I could have kept going, but I feel trunk produced better results with the same amount of effort.

Image

https://antoniosejas-wdvxm-studio.wp.build/

Image

It's also worth mentioning it started producing html which seems to be the goal, but it migrated HTML comments that appear in the editor.

Image

I also noticed that it uses Python. I hadn’t noticed that before, but I could be wrong.

Image

@annezazu
Copy link
Copy Markdown

Testing this and it definitely feels more "active" than before and fixes the long lags. I used it to redesign a current site it already built and it feels snappier in the process whereas before there were long lags of nothing (no messages, no sense of what it was doing, etc).

Copy link
Copy Markdown
Contributor

@epeicher epeicher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @youknowriad! I have tested it, and I have found the following:

  • Site specs worked fine, after one prompt it built the site
  • The quality of the site is comparable to similar prompts in old versions of trunk for my test case, I have found a quality improvement compared to the current trunk. I have found the double border in buttons that I tried to overcome with prompts like this one Image
  • The time it took is slightly longer than trunk, but not a huge difference, it took ~19mins while previous tests in trunk took ~15mins
  • The animations are back as they were in trunk before #3199, they are nice 👍
  • The logs show that the maximum time spent in the server was ~30 seconds, well below the timeout of 240 ✅
  • No prompt about maximum number of turns reached 👍
  • The core html blocks are not perfect, the blockify skill in this PR seems to improve but not everything is editable Image

In my testing, I see an improvement from current trunk as we're fixing timeouts and improving quality compared to #3199

@youknowriad
Copy link
Copy Markdown
Contributor Author

Haha, funny how your experience @epeicher is different than @sejas. To be honest, I'm not sure what to do :) It's very hard to say which is better, which is worse.

@youknowriad
Copy link
Copy Markdown
Contributor Author

Closing this PR for now, we need a better solution I think, maybe we can retry this later.

@youknowriad youknowriad deleted the claude/objective-ellis-bbce01 branch April 24, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants