You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Data Liberation] Add HTML to Blocks converter (#2095)
Adds a basic `WP_HTML_To_Blocks` class that accepts HTML and outputs
block markup.
It's a very basic converter. It only considers the markup and won't
consider any visual changes introduced via CSS or JavaScript. Only a few
core blocks are supported in this initial PR. The API can easily support
more HTML elements and blocks.
To preserve visual fidelity between the original HTML page and the
produced block markup, we'll need an annotated HTML input produced by
the [Try WordPress](https://github.com/WordPress/try-wordpress/) browser
extension. It would contain each element's colors, sizes, etc. We cannot
possibly get all from just analyzing the HTML on the server without
building a full-blown, browser-like HTML renderer in PHP, and I know I'm
not building one.
A part of #1894
## Example
```php
$html = <<<HTML
<meta name="post_title" content="My first post">
<p>Hello <b>world</b>!</p>
HTML;
$converter = new WP_HTML_To_Blocks( $html );
$converter->convert();
var_dump( $converter->get_all_metadata() );
/*
* array( 'post_title' => array( 'My first post' ) )
*/
var_dump( $converter->get_block_markup() );
/*
* <!-- wp:paragraph -->
* <p>Hello <b>world</b>!</p>
* <!-- /wp:paragraph -->
*/
```
## Caveats
I had to patch WP_HTML_Processor to stop baling out on `<meta>` tags
referencing the document charset. Ideally we'd patch WordPress core to
stop baling out when the charset is UTF-8.
## Testing instructions
This PR mostly adds new code. Just confirm the unit tests pass in CI.
cc @brandonpayton@zaerl@sirreal@dmsnell@ellatrix
0 commit comments