-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intent and internationalization #409
Comments
I think we should keep the dictionary lookup as currently described (case and [-_. ] insensitive, but otherwise an exact match) But a given system should be allowed to have a localised dictionary (or equivalently) columns of localised names giving localised entries. A system should also be allowed to return a speech hint in any localisation, whatever entry was used for lookup. Which would mean
In the first case the lookup SHOULD succeed (as it is in Core Intent) but it MAY return either open-interval or отворен-интервал depending on the system. In the second case the lookup MAY succeed and if it fails the default reading отворен-интервал($a,$b) would be given so using the first form would maximise the chance of lookup succeeding but may end up with English text, using the second form ensures a Bulgarian reading, but perhaps just the default literal reading of the intent, but either would be a reasonable choice to make, depending on requirements But this is just initial thoughts on how I'd imagined intent to work here, not really any fully worked plan... (Also only addressing intent lookup not your notational difference examples, they need more thought...) |
@dginev's Bulgarian examples often use Cyrillic in subscripts. I see no problem with that. Where I can see a problem is maybe with RTL languages doing that. But at least we have a solution with Unicode or
Finding out if this is universal or not is something I hope we can do (basically finding someone who knows more about vertical mode writing systems). |
I agree with @davidcarlisle's view on what will happen for names in Core and not in Core with the correct that if it is in core, it may not say "open interval". It may use "the interval from a to b, not including a or b" or something else. Core names are biased towards English in that if a translation for the language doesn't exist, then you end up with English because those are the words. But unless every bit of the math is marked up with I find some of the notations clever like ÷ for a (assumedly) geometric progress. Here are a few comments on others:
|
@NSoiffer filling in some minor details you asked about:
That's exactly it. An implication that may not be immediately obvious is that such differences result in "localizing" any "Default interpretation" rules for intent to a specific practice. Take In practice that may mean that a big part of the remediation of non-English documents will be neutralizing inappropriate defaults, or - similar to my expectation for arXiv - that advanced defaults (trying to infer intervals,lists, etc) will be completely disabled. Whichever gets more mileage.
Right, largely not problematic. The reason I took the trouble to include the variety of uses of Cyrillic in those two books (annotations, variable names, function names) was to demonstrate Cyrillic is an alphabet that is used in math syntax. The relevant background is that Murray S. had repeatedly claimed he had an expectation for Cyrillic to not be used, so far as to override that block to have different behavior in a particular use of Braille. I apologize for not remembering the technical details of what exact Braille feature was being enabled. |
@dginev wrote:
If it is correct MathML, it can't be the number 1,9 because that should be |
To the point about number syntax switch "." and ","... In Switzerland, I recently saw numbers written as "1'234'567". And spaces between digit blocks are also common. Also (for the record): Asian countries tend to use blocks of four digits, not three digits as in common in the Western world. |
On 8/10/22 20:44, NSoiffer wrote:
@dginev <https://github.com/dginev> wrote:
Take 1,9 - is it the number 1,9 in Bulgarian, or a list of 1 and 9
in the US?
If it is correct MathML, it can't be the number 1,9 because that
/should/ be |<mn>1,9</mn>|.
I disagree. The description of mn in the spec is a bit wishy-washy. It
starts off with "Generally speaking, a numeric literal is a sequence of
digits, perhaps including a decimal point..." but later says "since mn
is a presentation element, there are a few situations where it may be
desirable to include arbitrary text...".
My reading of this is that the content of mn is a "literal number", but
it doesn't say *which* number; it's presentation after all. That's
content or intent's job.
So, it seems to me quite legitimate that either comma or period can be
the "decimal point", while period or comma or even spacing can be used
as thousands (or 10K, or 100K...) separator.
[Aside: I've often wondered whether the "point" of "decimal point" was
universally understood as "dot" rather than "location".
Merriam-Webster says: "Definition of decimal point
: a period, centered dot, or in some countries a comma at the left of ..."
So they've generalized it to comma, but not to RTL! :> ]
|
A related detail here is that we may want to explicitly mention how/if As some examples:
On a separate note: <math>
<mn xml:lang="en">2</mn>
<mo>=</mo>
<mn xml:lang="bg">2</mn>
</math> which ought to render the two numbers identically on a screen and Braille display. But may (may not? open question) speak them differently. We could of course decide this level of complexity is unlikely to be useful, and recommend against using language annotations on any inner node. Just jotting down the question, and maybe this is partially related to the technical details of #425 . |
NVDA actually sticks a I haven't seen anything about how |
While I feel it is ok for |
That's everywhere in Europe.
Way fascinating! Bring me a scan please @NSoiffer , so I make this into the notation census! |
I am doubtful we should suggest multi-lingual processing of intents within a single page. |
@polx For digit grouping, Wikipedia has some discussion. |
To follow-up on my example here, today I was reminded (by listening to a talk by a UK speaker) that there are many names for the number 0 in English. The wiki article contains a nice overview as usual. So one can imagine my didactic example one level down the <math>
<mn xml:lang="en-uk">0</mn>
<mo>=</mo>
<mn xml:lang="en-us">0</mn>
</math> which may be expected to produce "naught equals zero". The slang narration examples from wiki may also be good illustrations for underscore use, as in |
We discussed in the call today that the specific mapping between a Core intent name (where the Core list is a unique list in English) and its translation in a different language, is not a Core connection between the two, but an Open one. As such, the details of connecting translations of Core concepts to their English counterparts is deferred to the way Open lists are organized. And we can close here without further action. Naturally, we can open new issues for other internationalization questions. |
Tying my last loose end, here is a write-up on the examples of international Bulgarian notations in math and chemistry that I am currently aware of:
https://hackmd.io/@dginev/Bk3RY5C6c
Systems by Western authors (such as MathCAT) would likely need explicit
intent
annotations on all notations that are unknown/conflicting with standard USA conventions. For a "worst-case but valid" example of markup, the open interval from 1.1 to 1.9 could be localized as:(assuming the presentation tree is this fragmented due to a hypothetical MathML generator tool that only recognizes a decimal dot and not a decimal comma, thus chunking a decimal number into two mn elements)
The main questions of this issue touch on w3c/mathml-docs#40 , but are distinctly different:
What should human remediators who are working on localized documents in non-English languages deposit as the "concept name" values of intent? Do we encourage localized alphabets or insist on English names?
How (if at all) will such values connect to the "Core" and "Open" Intent list values when they are simple translations? (e.g. "отворен-интервал" === "open-interval", "функция-на-Акерман" === "Ackermann-function")
It could help to have examples from additional language+alphabet combinations, but I hope this much is already helpful to bootstrap a discussion.
The text was updated successfully, but these errors were encountered: