-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
language variations as separate intents #434
Comments
Examples in French:
I know that some of this concerns English (eg. the plural). Is it a solution to allow "wildcards of a keyword" so that new language variations are possible to input but are still connected to their base? Is it a well known problem for AT writer that such a refinement is not fully accessible? I suppose so. |
One position is that:
And the usual justification for this minimalist position is that it keeps the annotation burden small, while still allowing enough scaffolding for symbolic translation to be possible (if not always perfect). For the example, adding an Knowing to use the right verb forms for singular and plural nouns seems to be strictly in the domain of the implementer - intent already makes it possible to count the exact number of arguments. I think it is a known issue that symbolic approaches to language translation quickly become large complex systems (e.g. language grammars), when they increase the amount of supported vocabulary, variation and syntax. I can testify that this also happens to grammars which try to model the full variability in math syntax. That said, neural translation approaches (NMT) starting with annotated MathML+Intent can directly leverage the annotations as well, and learn the language variation from a set of examples. They are also large complex systems, but at least we don't have to build them by hand. In any case, either approach is possible on a MathML+Intent expression, and either approach can have difficulties in arriving at a flawless translation on previously unseen expressions. |
I'll start with: gender in languages is hard for me to wrap my head around, but I do realize that a word may change based on the words around it or what it implicitly refers to. That can be hard to deal with and I suspect that several systems, such as the one Murray implemented in MS products, just punt on that. There is definitely merit in getting something in user's hands that is understandable (is it?) vs. trying to come up with a perfect solution and having nothing to use. In your example, it seems that knowing that A is a set of female mathematician results in different speech than if you know A is a set of male mathematicians. Is that correct? If so, I suspect that tagging it with ISA won't help much because understanding the gender of what will be essentially a "random" phrase. Pulling the gender from a "random" phrase is sure to be beyond what AT is capable of knowing. We could consider adding a gender component to ISA, but what else would need to be added. Seems like a rabbit hole as @dginev indicated. As for the "subjonctif" case, one can speak the same expression different ways in English such as a passive vs active tense: "A is a subset of B" and "A subsets B". That's a little clunky in this example, but not at all with an operator like |
Nope. A surface is a 2d-manifold as differential geometry teaches it. And this thing, in French, as anything, has a gender. Thus it needs to be "accord"ed with in all adjectives that qualify it. Nothing stylistic, the lack of it is just wrong. My worry is more that French has (just) two genders and that other such variations are much richer in other languages. We can't pre-encode "gender" as an attribute of the class of an isa... it will keep popping up with more genders!
It is a requirement of the sentence. Nothing optional stylistic. A different behaviour is just wrong French. But it might be that read-out-loud-tools should be acceptable to do that...
@MurraySargent should probably know. |
I don’t think Microsoft's French math speech observes gender. Perhaps this isn't super crucial since AT math speech generally wants to be concise, not elegant. Sort of pseudo speech. You'd say "a sub 2", not "a subscript 2", etc. But a verbosity setting could change the degree of brevity. |
To reply to Paul's original example, and maybe get a little headway for the issue:
Luckily inclusion seems to have a differently spoken, but identically written main concept name in French: Inclusion (mathématiques) So the intent markup for the two examples ought to be If I knew a bit more French I could come up with some other annotation than underscore for X Y, but alas - apologies. Luckily that's not the crux of the question. I think the main concept name should keep its encyclopedic gender (as seen in encyclopedic resources for a specific language).
Sounds like a case of recognizing context for speech synthesis, maybe even one where a neural model would do a better job than a symbolic AT :> Luckily we have some recent tools which do not require annotating the full linguistic details to synthesize an eloquent speech string. I currently like the simplest approach - one intent entry per encyclopedic concept, and leaving the linguistic complexities to smart AT implementations (or to pedantic annotators aware of the |
In the process of working on the translation of the raw intents of David F presented last Thursday, which took me less than 2h (but not including unicode symbols), I realized that the speak-aloud column in French may be richer than in English and that many more languages is going to trigger many more variations.
This issue is to discuss how much intents can cope for this expectation. Clearly handcrafted enrichments of intents are in a position to do it in many cases. Should the predefined intents' keywords ("core intents") simply ignore the variations?
The text was updated successfully, but these errors were encountered: