-
Notifications
You must be signed in to change notification settings - Fork 311
tools/content: Support surveying unimplemented KaTeX features #1600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ec6f49e
to
8c68c44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very useful, thanks!
I read through the code, then ran this on one open math community in order to see what the output looks like. Comments below, which I think should all be pretty quick.
Let's prioritize this next after the PR split described at #1452 (comment) and the small rebase at #1601 (comment) .
In particular it'd be great to be able to merge this right after that new first segment of #1452. Then we can use it to measure the impact of the remaining part of #1452, of #1559, and of #1601.
buf.writeln('Out of $totalMessageCount total messages,' | ||
' ${katexMessageIds.length} of them were KaTeX containing messages' | ||
' and ${failedKatexMessageIds.length} of them failed.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
buf.writeln('Out of $totalMessageCount total messages,' | |
' ${katexMessageIds.length} of them were KaTeX containing messages' | |
' and ${failedKatexMessageIds.length} of them failed.'); | |
buf.writeln('Out of $totalMessageCount total messages,' | |
' ${katexMessageIds.length} of them were KaTeX containing messages' | |
' and ${failedKatexMessageIds.length} of those failed.'); |
(otherwise it sounds like that's relative to the total messages rather than the KaTeX-containing messages)
buf.writeln(' Message IDs (upto 100): ${messageIds.take(100).join(', ')}'); | ||
buf.writeln(' TeX source (upto 30):'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "up to" is two words
buf.writeln(' Message IDs (upto 100): ${messageIds.take(100).join(', ')}'); | |
buf.writeln(' TeX source (upto 30):'); | |
buf.writeln(' Message IDs (up to 100): ${messageIds.take(100).join(', ')}'); | |
buf.writeln(' TeX source (up to 30):'); |
@@ -50,7 +50,7 @@ opt_verbose= | |||
opt_steps=() | |||
while (( $# )); do | |||
case "$1" in | |||
fetch|check) opt_steps+=("$1"); shift;; | |||
fetch|check|katex-check) opt_steps+=("$1"); shift;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update the usage message too (the --help
output), to keep it in sync.
for (final node in failedMathNodes.take(30)) { | ||
final type = switch (node) { | ||
MathBlockNode() => 'block', | ||
MathInlineNode() => 'inline', | ||
}; | ||
buf.writeln(' $type: "${node.texSource}"'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this on one example server, some of the output looks like:
inline: "\frac{1}{2} \delta_0 + \frac{1}{2} \delta_1"
inline: "\frac{1}{2} \delta_0 + \frac{1}{2} \delta_1"
block: "f(x) = \begin{cases}
diamond(x) & x \text{ not in top corner} \\
f(4x) & \text{otherwise}
\end{cases}"
inline: "\displaystyle{ \frac{\partial y}{\partial x} > 0}"
I think the key things I'd want from this output are to (a) skim it by eye, (b) copy-paste some of the examples into a Zulip test message. The "inline:" and quotes get in the way of (b), and don't help with (a) either.
How about trying to print them in Zulip Markdown syntax? So e.g. $$ \frac{1}{2} \delta_0 + \frac{1}{2} \delta_1 $$
, and something appropriate for blocks. (It might need to lose the indentation, which is fine. Just put a separator line before the next heading, similar to the separator line called divider
in the other script.)
buf.writeln(' HTML (upto 10):'); | ||
for (final node in failedMathNodes.take(10)) { | ||
buf.writeln(' ${node.debugHtmlText}'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These get pretty voluminous — try cutting them down to 3 examples.
for (final node in failedMathNodes.take(10)) { | ||
buf.writeln(' ${node.debugHtmlText}'); | ||
buf.writeln(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to end up with a blank line separating the examples. Maybe skip the extra writeln
?
for (final MapEntry(key: reason, value: messageIds) in failedMessageIdsByReason.entries.sorted( | ||
(a, b) => b.value.length.compareTo(a.value.length), | ||
)) { | ||
final failedMathNodes = failedMathNodesByReason[reason]!; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the TeX examples are looking redundant in the output I'm seeing. Like this:
Because of unsupported css class: nulldelimiter:
410 messages failed.
Oldest message: 191982974, Newest message: 524791558
Message IDs (upto 100): 507123830, 509468402, […], 489054468, 453621753
TeX source (upto 30):
inline: "\phi D \phi = \frac{1}{2}D(\phi^2)"
inline: "\phi D \phi = \frac{1}{2}D(\phi^2)"
inline: "\frac{n!}{k!(n-k)!}"
inline: "\frac{n!}{k!(n-k)!}"
inline: "k \frac{p_1}{2} = 0"
inline: "k \frac{p_1}{2} = 0"
inline: "\frac{\mathsf{Free}(G)}{\sim}"
inline: "\frac{\mathsf{Free}(G)}{\sim}"
inline: "\frac{\mathsf{Free}(G)}{\sim} \cong H_1(G)"
inline: "\frac{\mathsf{Free}(G)}{\sim} \cong H_1(G)"
[…]
In addition to expressions that are exactly equal, there are some that are similar (presumably because they came from the same conversation), which means each one adds less new information after the others. That makes the "up to 30" examples less useful than they could be.
I think two things would help:
- We can keep the failed math nodes for each reason as a set, rather than a list. That way if the identical node has e.g. the same unsupported CSS class in multiple places, it appears just once.
- Let's take a random 30 examples instead of the first 30 found:
final failedMathNodes = failedMathNodesByReason[reason]!; | |
final failedMathNodes = failedMathNodesByReason[reason]!; | |
failedMathNodes.shuffle(); |
Shuffling once at the top also means that when we take 3 or 10 for HTML, they'll be from among the examples we're showing TeX for. That's good for being able to compare the two formats to each other.
lib/model/katex.dart
Outdated
debugHtmlNode: kDebugMode ? innerSpan : null, | ||
node: innerSpanNode)); | ||
} else { | ||
throw _KatexHtmlParseError(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this on one example server, there's just one "unknown" among the common failure reasons, which is this line:
Because of hard fail: unknown "#0 _KatexParser._parseSpan (package:zulip/model/katex.dart:317:17)":
568 messages failed.
(out of 4776 total failed messages, out of 28109 total messages with KaTeX).
So this would be good to fill in with a few words about what the unexpected structure is.
Of the examples I'm currently seeing, the shortest looks to be:
$$ \widehat{C} $$
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi>C</mi><mo stretchy="true">^</mo></mover></mrow><annotation encoding="application/x-tex">\widehat{C}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9233em;"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9233em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal" style="margin-right:0.07153em;">C</span></span><span class="svg-align" style="width:calc(100% - 0.1667em);margin-left:0.1667em;top:-3.6833em;"><span class="pstrut" style="height:3em;"></span><span style="height:0.24em;"><svg width="100%" height="0.24em" viewBox="0 0 1062 239" preserveAspectRatio="none" xmlns="http://www.w3.org/2000/svg"><path d="M529 0h5l519 115c5 1 9 5 9 10 0 1-1 2-1 3l-4 22
c-1 5-5 9-11 9h-2L532 67 19 159h-2c-5 0-9-4-11-9l-5-22c-1-6 2-12 8-13z"></path></svg></span></span></span></span></span></span></span></span></span>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is most probably because of an unexpected CSS class value for inner span in a vlist when the vlist hosts an svg:
zulip-flutter/lib/model/katex.dart
Lines 195 to 198 in 2c5a28c
for (final innerSpan in vlist.nodes) { | |
if (innerSpan case dom.Element( | |
localName: 'span', | |
className: '', |
So the fix would be somewhat like:
diff --git a/lib/model/katex.dart b/lib/model/katex.dart
index 7022fb5a..a3babee5 100644
--- a/lib/model/katex.dart
+++ b/lib/model/katex.dart
@@ -195,13 +195,17 @@ class _KatexParser {
for (final innerSpan in vlist.nodes) {
if (innerSpan case dom.Element(
localName: 'span',
- className: '',
nodes: [
dom.Element(localName: 'span', className: 'pstrut') &&
final pstrutSpan,
...final otherSpans,
],
)) {
+ if (innerSpan.className != '') {
+ throw KatexHtmlParseError('vlist inner span has unexpected'
+ 'CSS class: ${innerSpan.className}');
+ }
+
var styles = _parseSpanInlineStyles(innerSpan)!;
final topEm = styles.topEm ?? 0;
But not sure if it should be included in #1452, as we are trying to keep the state of unmerged but released PRs as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that'd be good to include in this PR instead. This PR wasn't part of the releases, so having the commit appear in this PR helps us easily see that it's new since the releases.
(Though also this doesn't make a user-visible difference, right? It's throwing KatexHtmlParseError either way, and this just changes the string, which has no effect in the app itself. So if there had been a reason it was inconvenient to do as a separate commit and in this separate PR, we wouldn't lose much by including it in #1452.)
This will prevent string interpolation being evaluated during release build. Especially useful in later commit where it becomes more expensive.
And rename previous type to KatexSpanNode, also while making it a subtype of KatexNode.
And inline the behaviour for `inline: false` in MathBlock widget.
8c68c44
to
b42ed94
Compare
49fa272
to
19f89d8
Compare
Thanks for the review @gnprice! Revision pushed. |
19f89d8
to
3d99ccd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the revision! Just two small comments — otherwise looks good.
And please go ahead and start posting the results you're finding, in a thread in #mobile-team.
katex-check Check for unimplemented KaTeX features. This requires the | ||
corpus directory \`CORPUS_DIR\` to contain at least one corpus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use same indentation as the other steps and options
katex-check Check for unimplemented KaTeX features. This requires the | |
corpus directory \`CORPUS_DIR\` to contain at least one corpus | |
katex-check | |
Check for unimplemented KaTeX features. This requires the | |
corpus directory \`CORPUS_DIR\` to contain at least one corpus |
final failedMessageIdsByReason = <String, Set<int>>{}; | ||
final failedMathNodesByReason = <String, List<MathNode>>{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump part of #1600 (comment)
We can keep the failed math nodes for each reason as a set, rather than a list. That way if the identical node has e.g. the same unsupported CSS class in multiple places, it appears just once.
Less critical now that it's shuffled, but still better not to have dupes.
lib/model/katex.dart
Outdated
debugHtmlNode: kDebugMode ? innerSpan : null, | ||
node: innerSpanNode)); | ||
} else { | ||
throw _KatexHtmlParseError(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that'd be good to include in this PR instead. This PR wasn't part of the releases, so having the commit appear in this PR helps us easily see that it's new since the releases.
(Though also this doesn't make a user-visible difference, right? It's throwing KatexHtmlParseError either way, and this just changes the string, which has no effect in the app itself. So if there had been a reason it was inconvenient to do as a separate commit and in this separate PR, we wouldn't lose much by including it in #1452.)
Stacked on top of #1601.