tools/content: Support surveying unimplemented KaTeX features #1600

rajveermalviya · 2025-06-17T14:57:38Z

Stacked on top of #1601.

gnprice

Very useful, thanks!

I read through the code, then ran this on one open math community in order to see what the output looks like. Comments below, which I think should all be pretty quick.

Let's prioritize this next after the PR split described at #1452 (comment) and the small rebase at #1601 (comment) .

In particular it'd be great to be able to merge this right after that new first segment of #1452. Then we can use it to measure the impact of the remaining part of #1452, of #1559, and of #1601.

gnprice · 2025-06-18T22:15:14Z

tools/content/unimplemented_katex_test.dart

+    buf.writeln('Out of $totalMessageCount total messages,'
+      ' ${katexMessageIds.length} of them were KaTeX containing messages'
+      ' and ${failedKatexMessageIds.length} of them failed.');


nit:

Suggested change

buf.writeln('Out of $totalMessageCount total messages,'

' ${katexMessageIds.length} of them were KaTeX containing messages'

' and ${failedKatexMessageIds.length} of them failed.');

buf.writeln('Out of $totalMessageCount total messages,'

' ${katexMessageIds.length} of them were KaTeX containing messages'

' and ${failedKatexMessageIds.length} of those failed.');

(otherwise it sounds like that's relative to the total messages rather than the KaTeX-containing messages)

gnprice · 2025-06-18T22:15:27Z

tools/content/unimplemented_katex_test.dart

+      buf.writeln('  Message IDs (upto 100): ${messageIds.take(100).join(', ')}');
+      buf.writeln('  TeX source (upto 30):');


nit: "up to" is two words

Suggested change

buf.writeln(' Message IDs (upto 100): ${messageIds.take(100).join(', ')}');

buf.writeln(' TeX source (upto 30):');

buf.writeln(' Message IDs (up to 100): ${messageIds.take(100).join(', ')}');

buf.writeln(' TeX source (up to 30):');

gnprice · 2025-06-18T22:21:03Z

tools/content/check-features

@@ -50,7 +50,7 @@ opt_verbose=
 opt_steps=()
 while (( $# )); do
    case "$1" in
-        fetch|check) opt_steps+=("$1"); shift;;
+        fetch|check|katex-check) opt_steps+=("$1"); shift;;


Let's update the usage message too (the --help output), to keep it in sync.

gnprice · 2025-06-18T23:36:55Z

tools/content/unimplemented_katex_test.dart

+      for (final node in failedMathNodes.take(30)) {
+        final type = switch (node) {
+          MathBlockNode() => 'block',
+          MathInlineNode() => 'inline',
+        };
+        buf.writeln('    $type: "${node.texSource}"');


Running this on one example server, some of the output looks like:

inline: "\frac{1}{2} \delta_0 + \frac{1}{2} \delta_1" inline: "\frac{1}{2} \delta_0 + \frac{1}{2} \delta_1" block: "f(x) = \begin{cases} diamond(x) & x \text{ not in top corner} \\ f(4x) & \text{otherwise} \end{cases}" inline: "\displaystyle{ \frac{\partial y}{\partial x} > 0}"

I think the key things I'd want from this output are to (a) skim it by eye, (b) copy-paste some of the examples into a Zulip test message. The "inline:" and quotes get in the way of (b), and don't help with (a) either.

How about trying to print them in Zulip Markdown syntax? So e.g. $$ \frac{1}{2} \delta_0 + \frac{1}{2} \delta_1 $$, and something appropriate for blocks. (It might need to lose the indentation, which is fine. Just put a separator line before the next heading, similar to the separator line called divider in the other script.)

gnprice · 2025-06-18T23:38:01Z

tools/content/unimplemented_katex_test.dart

+      buf.writeln('  HTML (upto 10):');
+      for (final node in failedMathNodes.take(10)) {
+        buf.writeln('    ${node.debugHtmlText}');


These get pretty voluminous — try cutting them down to 3 examples.

gnprice · 2025-06-18T23:38:27Z

tools/content/unimplemented_katex_test.dart

+      for (final node in failedMathNodes.take(10)) {
+        buf.writeln('    ${node.debugHtmlText}');
+        buf.writeln();
+      }


This seems to end up with a blank line separating the examples. Maybe skip the extra writeln?

gnprice · 2025-06-18T23:46:24Z

tools/content/unimplemented_katex_test.dart

+    for (final MapEntry(key: reason, value: messageIds) in failedMessageIdsByReason.entries.sorted(
+      (a, b) => b.value.length.compareTo(a.value.length),
+    )) {
+      final failedMathNodes = failedMathNodesByReason[reason]!;


A lot of the TeX examples are looking redundant in the output I'm seeing. Like this:

Because of unsupported css class: nulldelimiter: 410 messages failed. Oldest message: 191982974, Newest message: 524791558 Message IDs (upto 100): 507123830, 509468402, […], 489054468, 453621753 TeX source (upto 30): inline: "\phi D \phi = \frac{1}{2}D(\phi^2)" inline: "\phi D \phi = \frac{1}{2}D(\phi^2)" inline: "\frac{n!}{k!(n-k)!}" inline: "\frac{n!}{k!(n-k)!}" inline: "k \frac{p_1}{2} = 0" inline: "k \frac{p_1}{2} = 0" inline: "\frac{\mathsf{Free}(G)}{\sim}" inline: "\frac{\mathsf{Free}(G)}{\sim}" inline: "\frac{\mathsf{Free}(G)}{\sim} \cong H_1(G)" inline: "\frac{\mathsf{Free}(G)}{\sim} \cong H_1(G)" […]

In addition to expressions that are exactly equal, there are some that are similar (presumably because they came from the same conversation), which means each one adds less new information after the others. That makes the "up to 30" examples less useful than they could be.

I think two things would help:

We can keep the failed math nodes for each reason as a set, rather than a list. That way if the identical node has e.g. the same unsupported CSS class in multiple places, it appears just once.

Let's take a random 30 examples instead of the first 30 found:

Suggested change

final failedMathNodes = failedMathNodesByReason[reason]!;

final failedMathNodes = failedMathNodesByReason[reason]!;

failedMathNodes.shuffle();

Shuffling once at the top also means that when we take 3 or 10 for HTML, they'll be from among the examples we're showing TeX for. That's good for being able to compare the two formats to each other.

gnprice · 2025-06-18T23:53:24Z

lib/model/katex.dart

+                  debugHtmlNode: kDebugMode ? innerSpan : null,
+                  node: innerSpanNode));
+              } else {
+                throw _KatexHtmlParseError();


Running this on one example server, there's just one "unknown" among the common failure reasons, which is this line:

Because of hard fail: unknown "#0 _KatexParser._parseSpan (package:zulip/model/katex.dart:317:17)": 568 messages failed.

(out of 4776 total failed messages, out of 28109 total messages with KaTeX).

So this would be good to fill in with a few words about what the unexpected structure is.

Of the examples I'm currently seeing, the shortest looks to be:
$$ \widehat{C} $$

<math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi>C</mi><mo stretchy="true">^</mo></mover></mrow><annotation encoding="application/x-tex">\widehat{C}</annotation></semantics></math>C<svg width="100%" height="0.24em" viewBox="0 0 1062 239" preserveAspectRatio="none" xmlns="http://www.w3.org/2000/svg"><path d="M529 0h5l519 115c5 1 9 5 9 10 0 1-1 2-1 3l-4 22 c-1 5-5 9-11 9h-2L532 67 19 159h-2c-5 0-9-4-11-9l-5-22c-1-6 2-12 8-13z"></path></svg>

This is most probably because of an unexpected CSS class value for inner span in a vlist when the vlist hosts an svg:

zulip-flutter/lib/model/katex.dart

Lines 195 to 198 in 2c5a28c

for (final innerSpan in vlist.nodes) {

if (innerSpan case dom.Element(

localName: 'span',

className: '',

So the fix would be somewhat like:

diff --git a/lib/model/katex.dart b/lib/model/katex.dart index 7022fb5a..a3babee5 100644 --- a/lib/model/katex.dart +++ b/lib/model/katex.dart @@ -195,13 +195,17 @@ class _KatexParser { for (final innerSpan in vlist.nodes) { if (innerSpan case dom.Element( localName: 'span', - className: '', nodes: [ dom.Element(localName: 'span', className: 'pstrut') && final pstrutSpan, ...final otherSpans, ], )) { + if (innerSpan.className != '') { + throw KatexHtmlParseError('vlist inner span has unexpected' + 'CSS class: ${innerSpan.className}'); + } + var styles = _parseSpanInlineStyles(innerSpan)!; final topEm = styles.topEm ?? 0;

But not sure if it should be included in #1452, as we are trying to keep the state of unmerged but released PRs as-is.

Yeah, that'd be good to include in this PR instead. This PR wasn't part of the releases, so having the commit appear in this PR helps us easily see that it's new since the releases.

(Though also this doesn't make a user-visible difference, right? It's throwing KatexHtmlParseError either way, and this just changes the string, which has no effect in the app itself. So if there had been a reason it was inconvenient to do as a separate commit and in this separate PR, we wouldn't lose much by including it in #1452.)

This will prevent string interpolation being evaluated during release build. Especially useful in later commit where it becomes more expensive.

… nodes

And rename previous type to KatexSpanNode, also while making it a subtype of KatexNode.

And inline the behaviour for `inline: false` in MathBlock widget.

rajveermalviya · 2025-06-19T15:45:41Z

Thanks for the review @gnprice! Revision pushed.

gnprice

Thanks for the revision! Just two small comments — otherwise looks good.

And please go ahead and start posting the results you're finding, in a thread in #mobile-team.

gnprice · 2025-06-19T21:08:52Z

tools/content/check-features

+  katex-check   Check for unimplemented KaTeX features.  This requires the
+                corpus directory \`CORPUS_DIR\` to contain at least one corpus


nit: use same indentation as the other steps and options

Suggested change

katex-check Check for unimplemented KaTeX features. This requires the

corpus directory \`CORPUS_DIR\` to contain at least one corpus

katex-check

Check for unimplemented KaTeX features. This requires the

corpus directory \`CORPUS_DIR\` to contain at least one corpus

gnprice · 2025-06-19T21:12:57Z

tools/content/unimplemented_katex_test.dart

+    final failedMessageIdsByReason = <String, Set<int>>{};
+    final failedMathNodesByReason = <String, List<MathNode>>{};


bump part of #1600 (comment)

We can keep the failed math nodes for each reason as a set, rather than a list. That way if the identical node has e.g. the same unsupported CSS class in multiple places, it appears just once.

Less critical now that it's shuffled, but still better not to have dupes.

gnprice · 2025-06-19T21:17:07Z

lib/model/katex.dart

+                  debugHtmlNode: kDebugMode ? innerSpan : null,
+                  node: innerSpanNode));
+              } else {
+                throw _KatexHtmlParseError();


Yeah, that'd be good to include in this PR instead. This PR wasn't part of the releases, so having the commit appear in this PR helps us easily see that it's new since the releases.

(Though also this doesn't make a user-visible difference, right? It's throwing KatexHtmlParseError either way, and this just changes the string, which has no effect in the app itself. So if there had been a reason it was inconvenient to do as a separate commit and in this separate PR, we wouldn't lose much by including it in #1452.)

rajveermalviya added the maintainer review PR ready for review by Zulip maintainers label Jun 17, 2025

rajveermalviya force-pushed the pr-tex-content-survery branch 2 times, most recently from ec6f49e to 8c68c44 Compare June 18, 2025 18:33

gnprice reviewed Jun 18, 2025

View reviewed changes

rajveermalviya and others added 10 commits June 19, 2025 14:39

content test [nfc]: Use const for math block tests

b42f784

content test [nfc]: Enable skips in testParseExample and testParse

4231608

content [nfc]: Inline _logError in _KatexParser._parseSpan

04a3d25

This will prevent string interpolation being evaluated during release build. Especially useful in later commit where it becomes more expensive.

content [nfc]: Refactor _KatexParser._parseChildSpans to take list of…

e2022de

… nodes

content: Populate debugHtmlNode for KatexNode

82c6b01

content [nfc]: Reintroduce KatexNode as a base sealed class

b6da144

And rename previous type to KatexSpanNode, also while making it a subtype of KatexNode.

content: Ignore more KaTeX classes that don't have CSS definition

c8f7d0b

content: Handle 'mspace' and 'msupsub' KaTeX CSS classes

51fb5f5

content [nfc]: Remove the inline property in _Katex widget

db0c1d8

And inline the behaviour for `inline: false` in MathBlock widget.

content: Support parsing and handling inline styles for KaTeX content

0bf40b8

rajveermalviya force-pushed the pr-tex-content-survery branch from 8c68c44 to b42ed94 Compare June 19, 2025 14:10

rajveermalviya added 3 commits June 19, 2025 19:41

content [nfc]: Make MathNode a sealed class

3f8913c

content [nfc]: Make KatexHtmlParseError private

e6c0caf

content: Allow KaTeX parser to report failure reasons

867b8f3

rajveermalviya force-pushed the pr-tex-content-survery branch 2 times, most recently from 49fa272 to 19f89d8 Compare June 19, 2025 15:27

rajveermalviya mentioned this pull request Jun 19, 2025

KaTeX (2/n): Support horizontal and vertical offsets for spans #1452

Open

rajveermalviya added 2 commits June 19, 2025 23:28

tools/content: Support surveying unimplemented KaTeX features

801d726

tools/content: Add a flag to control verbosity of KaTeX check result

3d99ccd

rajveermalviya force-pushed the pr-tex-content-survery branch from 19f89d8 to 3d99ccd Compare June 19, 2025 17:58

gnprice reviewed Jun 19, 2025

View reviewed changes

gnprice added integration review Added by maintainers when PR may be ready for integration and removed maintainer review PR ready for review by Zulip maintainers labels Jun 19, 2025

		buf.writeln(' Message IDs (upto 100): ${messageIds.take(100).join(', ')}');
		buf.writeln(' TeX source (upto 30):');

	final failedMathNodes = failedMathNodesByReason[reason]!;
	final failedMathNodes = failedMathNodesByReason[reason]!;
	failedMathNodes.shuffle();

	for (final innerSpan in vlist.nodes) {
	if (innerSpan case dom.Element(
	localName: 'span',
	className: '',

		katex-check Check for unimplemented KaTeX features. This requires the
		corpus directory \`CORPUS_DIR\` to contain at least one corpus

		final failedMessageIdsByReason = <String, Set<int>>{};
		final failedMathNodesByReason = <String, List<MathNode>>{};

tools/content: Support surveying unimplemented KaTeX features #1600

Are you sure you want to change the base?

tools/content: Support surveying unimplemented KaTeX features #1600

Uh oh!

Conversation

rajveermalviya commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gnprice left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajveermalviya commented Jun 19, 2025

Uh oh!

gnprice left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rajveermalviya commented Jun 17, 2025 •

edited

Loading