Skip to content

Migrate from Wattsi to Bikeshed #297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
foolip opened this issue Mar 14, 2025 · 16 comments
Open

Migrate from Wattsi to Bikeshed #297

foolip opened this issue Mar 14, 2025 · 16 comments

Comments

@foolip
Copy link
Member

foolip commented Mar 14, 2025

The HMTL spec is built using Wattsi, which is different in many ways to Bikeshed. The differences can be a bit of hassle for people who are used to Bikeshed, although one does get used to Wattsi's syntax after a while. Bikeshed source can be more compact autolinks and some Markdown syntax is used, but doesn't have to be. See whatwg/html#11026 for an initial discussion of this topic.

Known blockers/requirements:

  • Performance. @tabatkins has a plan to make Bikeshed's parser pass much faster using lxml.
  • A tool to faithfully convert the source to Bikeshed. In order to support converting in-flight PRs, it has to work reliably over time and not just at a single point in time. I have been experimenting in Add experimental support to build using Bikeshed #296 understand what this would entail.
  • Stability guarantees so that the HTML build isn't broken by Bikeshed changes. This can hopefully be addressed in Bikeshed's CI.
  • Multiple document support speced/bikeshed#269

Discuss!

cc @annevk @domenic @domfarolino @sideshowbarker @zcorpan

@foolip
Copy link
Member Author

foolip commented Mar 14, 2025

Some notes on cross-references in Wattsi and Bikeshed after experimenting in #296.

In Wattsi:

  • <dfn>s have a single "topic" string coming from either a data-x attribute or the text content. The topic string isn't necessarily human readable, it's often the intended ID. The data-x="" attribute can appear on a child element, and is effectively hoisted. The for attribute is translated to data-dfn-for but otherwise ignored.
  • Links are created by <span>, <i>, and <code> elements. There are subtle differences, but basically the "topic" is extracted and the <dfn> looked up using that.

Typical Wattsi usage:

Something defined as <dfn data-x="concept-response-type">type</dfn>.

It is linked as <span data-x="concept-response-type">type</span>.

In Bikeshed:

  • <dfn>s have a set of "linking texts" extracted from the lt attribute (split on "|") or the text content. The for attribute is typically used.
  • Links are created by <a>foo</a> or shorthands like [=foo=] for concepts and {{foo}} for code.

Typical Bikeshed usage:

Something defined as <dfn for=response>type</dfn>.

It is linked as <a for=response>type</a> or [=response/type=], or [=type=] if unambiguous.

Many cases can be translated just fine.

The biggest challenge so far is translating the pattern <dfn export data-x="concept-response-type">type</dfn>. <dfn export lt="concept-response-type">type</dfn> doesn't work because the output would be data-lt="concept-response-type", and we shouldn't change the exported linking texts. The workaround I've found is using local-lt="concept-response-type" but it's more verbose and incompatible with in compact links to the definition like [=response/type=].

I think some heuristics may be needed to add for attributes, but I don't know how reliable it can be.

@annevk
Copy link
Member

annevk commented Mar 14, 2025

We already annotate exported <dfn>s with appropriate for attributes and such today. How much work would it be to annotate the remainder of <dfn> elements in the same manner? Because then we have an answer for how each data-x is to be converted (and we need to keep that string as well, to be clear, but as ID).

@tabatkins
Copy link

Yeah, I think all those are gonna need a manual pass to turn them into for attributes, so <dfn id=concept-response-type for=response>type</dfn>. A bunch are already done that way, and it'll be a strict improvement for the ecosystem to have them namespaced better anyway.

Links are created by <span>, <i>, and <code> elements. There are subtle differences

What are the differences? Is it enough to just replace span with a, and i/code with an a child? (Or parent, whatever Wattsi currently does for those when it adds links.)

@tabatkins
Copy link

Performance. @tabatkins has a plan to make Bikeshed's parser pass much faster using lxml.

To be clear here, after a bit of tweaking a few weeks ago to kill some accidental quadratics, Bikeshed is already on par with Wattsi for performance. Exact numbers depend on environment, but it seems to process in ~1min, which is about how long Wattsi takes. A decent chunk (~20%?) of that is due to the duplicate parsing pass I'm still performing, which'll go away with the conclusion of the parser-rewrite project this year. There's even more potential for improvement with a larger project I have in the wings to replace LXML as the tree model entirely, but that might not happen this year, and if it does it'll be late in the year.

In order to support converting in-flight PRs, it has to work reliably over time and not just at a single point in time. I have been experimenting in #296 understand what this would entail.

I feel like this isn't actually required, at least not to the extent that this text makes it seem. Converting in-flight PRs is more or less a one-time cost; after that, the source and all extant PRs are all in Bikeshed and we dont' have to worry about it again. Plus, most PRs are fairly small; even if we chose to commit to "Tab manually converts all extant PRs by hand", it's not much work.

The only reason it would have to work "over time" is if you're concerned about edits that are currently in the process of being written, but haven't yet been committed to the repo as a PR, at the time of the switchover. Again, this seems like a fairly small set of things, and amenable to just handling manually as needed.

[blockers]

The other big blocker is multipage output, which I'm working on right now and expect to have done by the end of the month.

Stability guarantees so that the HTML build isn't broken by Bikeshed changes. This can hopefully be addressed in Bikeshed's CI.

Yeah, I already test a whole bunch of specs as part of my CI, so I can see when something is going to break. HTML would definitely become a part of that.

@foolip
Copy link
Member Author

foolip commented Mar 14, 2025

We already annotate exported <dfn>s with appropriate for attributes and such today. How much work would it be to annotate the remainder of <dfn> elements in the same manner? Because then we have an answer for how each data-x is to be converted

There's a total of of 6794 <dfn> elements now, excluding <dfn data-x=""> which aren't processed. Breakdown:

  • 4298 with neither export nor for
  • 2069 without export but with for
  • 290 with export but no for (is this the category that was ~done?)
  • 137 with both export and for

So that's over 4500 <dfn>s with no for attribute.

(and we need to keep that string as well, to be clear, but as ID).

Yeah, this is what I've done. I think it'll be a lot of IDs in the source however, so I'm thinking about an HTML mode for Bikeshed to generate IDs in the style of the HTML spec, so that the generated IDs are correct more of the time.

@domenic
Copy link
Member

domenic commented Mar 14, 2025

~1min, which is about how long Wattsi takes.

This is not accurate. Wattsi takes ~4 seconds. Even the build server version (which includes upload/download times and is running on a relatively slow machine) takes 14 seconds.

@tabatkins
Copy link

Huh, I was basing those assumptions on the runtime of the "build" step in https://github.com/whatwg/html/actions/runs/13847732392/job/38749434294, but it does look like that's doing a bunch of other stuff. There's no sub-step timing information, so I can't tell what's what. Any chance of a breakdown in the relative times of those things, so I have a better benchmark?

@domenic
Copy link
Member

domenic commented Mar 14, 2025

I imagine most of that time is spent on:

  • Downloading Docker images and setting up Docker environments
  • The HTML validator
  • The Prince PDF generator, including setting up a server to serve the files

The CI time is not really relevant as CI builds are not what people iterate based on. I'd suggest doing local Wattsi builds and timing those.

Here's an example on one of my slower machines.

domenic@Domenic-White:/mnt/c/Users/d/OneDrive - domenic.me/Code/GitHub/whatwg/html$ time wattsi --single-page-only source deadb33f ../html-build/out default ../html-build/.cache/mdn-spec-links-html.json
Parsing MDN data...
Parsing...
Generating HTML variant...
Saving index-html

real    0m10.117s
user    0m2.269s
sys     0m0.600s

@tabatkins
Copy link

Cool, I'll check that out.

I will say, tho, that 4s is, uh, not obtainable with Python. Absolutely out of the question. Even if Bikeshed did literally nothing but run the source thru the initial html parser and then immediately quit, that's currently about 20s of time for a document the size of HTML.

My planned projects will be knocking substantial chunks off the runtime still. I think 40s is very obtainable by the end of the year, and possibly down to 30s. I've also done zero real perf hacking on the parser so far, and am very interested in what possibilities might exist there. But it would be good to know, for the sake of this issue, whether a 1min build time is an acceptable tradeoff for the benefit of moving to a more widely-used build system.

(A longer-term project is, of course, learning Rust and porting Bikeshed to it. That should give some massive benefits. If it's important I can bump the priority of trying to do that, but it'll still be a longer term project, with a 2026 completion likely.

@domenic
Copy link
Member

domenic commented Mar 14, 2025

I will say, tho, that 4s is, uh, not obtainable with Python. Absolutely out of the question. Even if Bikeshed did literally nothing but run the source thru the initial html parser and then immediately quit, that's currently about 20s of time for a document the size of HTML.

Yeah, I sympathize, but that's why I think it's always been a long shot to consider moving HTML to Bikeshed. (Instead of, e.g., improving Wattsi to accept more Bikeshed syntax.)

But it would be good to know, for the sake of this issue, whether a 1min build time is an acceptable tradeoff for the benefit of moving to a more widely-used build system.

It would not be. We do frequent rapid iteration and need our build tools to support this. I think 15 seconds on a slow machine like the one I'm testing now would be my hard limit.

I think we can instead get many of the same benefits from projects that improve the existing HTML build tooling so that people writing the HTML spec don't need to be as aware of the differences, or deal with the more burdensome things that Bikeshed-style preprocessors can do for you. E.g., #293.


Separate from performance:

An issue we discussed at WHATNOT is that just doing a minimal conversion isn't really worth it, if the result is a source file that looks basically like a Wattsi source file. We'd want to be properly using a large set of Bikeshed-isms, e.g. the definition-scoping mentioned above, but also things like [=short/link=]s, Markdown paragraphs and other syntax, etc. We tentatively identified the goal of describing our desired subset of Bikeshed features to use as an initial step, so getting that discussion started somewhere would be good.

This also seems important to establish ahead of time since I suspect such features will increase the processing time.

Stability guarantees so that the HTML build isn't broken by Bikeshed changes. This can hopefully be addressed in Bikeshed's CI.

Yeah, I already test a whole bunch of specs as part of my CI, so I can see when something is going to break. HTML would definitely become a part of that.

This is not the kind of stability Philip and I discussed. Our concerns with Bikeshed stability stem from issues like its CI being constantly red, its PRs not receiving any code review from a second person, or the biggest breaking changes in Bikeshed history (the new parser) being released with no release notes or notification, or the whole definition panels saga.

(Briefly: we tried to maintain our definition panel styling ourselves, for uniformity across the WHATWG spec ecosystem and because we believe that fewer inline styles and scripts is better. Bikeshed started changing the markup structure with no warning, breaking definition panels for all WHATWG specs. We tried to keep up for a few weeks, but eventually had to give up and let Bikeshed put a bunch of inline styles and scripts into the documents because the markup structure kept changing too rapidly.

As a consequence of this, one day there was a huge regression in definition panels usability. Over the next few weeks, some of the regression was fixed, but the unwanted animations stayed around, and our requests to have them reverted were rebuffed.)

@tabatkins
Copy link

It would not be. We do frequent rapid iteration and need our build tools to support this. I think 15 seconds on a slow machine like the one I'm testing now would be my hard limit.

Okay, that's good to know. @foolip, that changes the calculus for you significantly, then.

We'd want to be properly using a large set of Bikeshed-isms, e.g. the definition-scoping mentioned above, but also things like [=short/link=]s, Markdown paragraphs and other syntax, etc. [...] This also seems important to establish ahead of time since I suspect such features will increase the processing time.

I agree that being able to properly use Bikeshed's features would be useful. Based on my own explorations in this area, your fear about cost should be unfounded. I tested with "15 copies of DOM stapled together", which uses all the Bikeshed features and is roughly the same source size as HTML, and it's comparable in runtime to the roughly-bikeshedded HTML. Basically those features are either cheap enough to not make a difference, or you're already paying for them regardless of whether you use them.

some tangential details The vast majority (about 75%) of processing time in Bikeshed is in the initial parsing: out of a ~60s runtime for the bikeshedded html spec on my laptop, it's spending 27s in *my* html parser, 16s in LXML's html parser, and 4s in my markdown parser. All the rest of Bikeshed's processing is 13s in total. I can certainly squeeze a little more blood out of that last "everything else" bit (I just knocked a second off with some trivial changes I spotted from the profile I generated to get these numbers...), but most of the perf win will be in improving parsing time, most importantly killing the extra LXML parse. That said, the LXML parse is still 16s, and that's with a mature project largely written in C, so I suspect that's about the lower bound I can hit; Python just costs a lot. We'll see, tho; I do have some big ideas here that might just be impossible for LXML to pursue, given their desired API shapes. In any case, moving to Rust will likely be the only "real" solution for knocking this out, since it's just a lot of string munging and struct creation.

its CI being constantly red

You've mentioned this in the past, and I've explained why this is wrong in both its details and its overall gist.

  • The CI is always green when I cut releases, which is what's important for users. Virtually all usage of Bikeshed is via the pypi releases.
  • The vast majority of real-world time is spent with green CI anyway
  • The vast majority of CI redness is just lint issues, which I almost always immediately clear anyway. Most of the rest is code on branches, which mixes with the rest of the things in the Actions tab, and is totally okay to be red during development.
  • "being red" is what CI is for, to a first approximation. As long as the audience being exposed to the redness is minimal and prepared for it, this is 100% okay.

I would appreciate it if you would stop bringing this up, as it's simply inaccurate.

its PRs not receiving any code review from a second person

An unfortunate consequence of Bikeshed being a 1-person dev team in practice, yeah. Would be nice to have more engineers on it, but that hasn't been something worth pursuing headcount for.

the biggest breaking changes in Bikeshed history (the new parser) being released with no release notes or notification,

This was postmortem'd, and changes were made to my process to ensure it didn't happen again. (The issue was a mistake in how I was testing, so I missed some very significant changes that should have prevented the release). I'll note that the subsequent re-release of the new parser was indeed smooth, and another similarly-large change recently (the upgrade to move all special inline syntaxes into the parser) went off extremely smoothly, with virtually all "issues" downstream users encountered being existing errors getting caught that were previously being silently ignored, sometimes resulting in actual errors making it into specs.

the whole definition panels saga

Yup, I wasn't particularly happy with that either; some of that usability regression should have been caught by manual testing. I was trusting a new dev on the project to do things well; growing pains.

That said, I don't see any "rebuffing" in that issue. I disagreed and thought (still think!) some aspects of the new behavior were reasonable, and there were no further responses on the thread so I assumed it was acceptable. If this is incorrect and y'all are still annoyed with any of this, please leave a comment to that effect. As you well know from spec editting, silence is usually taken as acceptance.


I'm assuming that we'll drop the "HTML in Bikeshed" project for now until/unless Bikeshed is able to deliver more comparable perf numbers. But when that is the case, I'm happy to discuss what more can be done for stability.

@domenic
Copy link
Member

domenic commented Mar 15, 2025

I would appreciate it if you would stop bringing this up, as it's simply inaccurate.

I don't plan to stop, sorry. It's not inaccurate, as one can see from a glance at https://github.com/speced/bikeshed/commits/main/. And CI being red is not what CI is for; it's a sign of a project with insufficient review structure and engineering rigor. It's certainly not acceptable in the majority of open source projects, as you can confirm by checking around the ecosystem.

@foolip
Copy link
Member Author

foolip commented Mar 17, 2025

@domenic @tabatkins I expect there will be some additional work needed on performance. I think the most useful comparison is how long it takes to run bikeshed vs wattsi from within build.sh, and I think that will dominate the time taken for ./build.sh --fast. On my machine it's currently 32 vs. 3 seconds.

However, the Bikeshed source produced by #296 is still quite broken and produces 45k lines of errors. Just printing the errors takes some time, and I'd like to look at the performance with a more realistic input after some more work.

If it should turn out that parsing is a big chunk of the time, then the LXML approach might help a lot. In a quick test, parsing source-whatwg-complete.bs with lxml.html.parse and walking the tree with LXML takes 0.27 seconds. So ultimately it should come down to how fast Bikeshed is with everything that comes after HTML parsing.

We can also look into a --fast mode for Bikeshed that just skips some work that isn't as important for local iteration.

@foolip
Copy link
Member Author

foolip commented Mar 17, 2025

The other big blocker is multipage output, which I'm working on right now and expect to have done by the end of the month.

@tabatkins I'll add it to the list up top. Do you have a tracking issue for this?

@tabatkins
Copy link

I'm not going to argue with you further about this, as you're not meaningfully engaging with what I'm saying. It's not currently relevant, and I can deal with it when and if it becomes so.

@tabatkins
Copy link

Do you have a tracking issue for this?

Yup, speced/bikeshed#269

and produces 45k lines of errors.

Hmm, I should do a little testing to see how much that really is costing. I purposely don't pay attention to the cost of printing errors, because they should be rare and it's much better for them to be useful even if it's expensive to construct the message. I can turn off a handful of the common ones and see what that's costing.

If it should turn out that parsing is a big chunk of the time, then the LXML approach might help a lot.

So this turned out to be really important. I was despairing about ever getting a really significant win, enough to bring us back into the realms requested here, since the lxml parser was only about 40% faster than mine even tho it was written in C. But it turns out I was never using the lxml parser, I was using the html5lib parser, which is pure Python and not very optimized (it just uses the lxml tree structure). Swapping out for the actual lxml html parser made that ~16s of parsing time drop down to about 0.25s. It's nowhere near spec-compliant, but now that my parser checks for a lot of document structure issues in the first place, that's not nearly as concerning. It's producing some slightly different document structure so I've got some bugs to iron out, but I'm already quite happy with this - it drops total time down to about 40s on my laptop now, from 60s. (I've made a few more tweaks that account for the additional handful of lost seconds.)

That, then, suggests that there's incredible amounts of win available if I shift my parser into C, or ideally Rust. I know Rust-in-Python has gotten a lot more love in the last few years and should be quite usable these days. So I'll make that a Q2 project. If it goes well, it means I could virtually eliminate all my parsing costs, dropping the entire parse step to 1s or less on the html spec. I'd be left with just the 14s or so of "everything else", and I know there's still juice to be wrung out there. (I might even be able to transfer some of it over to Rust as well!) There's also definitely a few spots that could be turned off for a "fast path" by default when running locally (but turned back on in CI), like verifying that arg names aren't misspelled (0.8s right now) or that 2119 keywords aren't misused (1.2s right now).

So that shifts my entire outlook from "ugh, this is probably impossible" to "this is extremely possible and very likely even in the short term". I'll go ahead and keep my plans for now, and just shift my next quarter's project a bit to accommodate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants