Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a simple process for updating crk FST into itwêwina #257

Open
aarppe opened this issue Feb 9, 2020 · 3 comments
Open

Create a simple process for updating crk FST into itwêwina #257

aarppe opened this issue Feb 9, 2020 · 3 comments
Labels
Improvement Expansion or improvement of a current functionality that does already work and meets previous specs requires-backend-work Requires work to Python, scripts, automation, etc.

Comments

@aarppe
Copy link
Contributor

aarppe commented Feb 9, 2020

Creating a more permanent solution for #256, since the crk FST will be updated based on CW updates and modifications to the affixation, we would need a stream-lined process which will update the itwêwina FSTs (descriptive analyzer and normative generator) and the content that is generated with the FSTs (paradigm content).

As part of that process, we would need some diagnostics for checking that the change won't wreck the functionality of itwêwina. Likely, the paradigms in giella/langs/crk/test/src/gt-norm-yamls.

@aarppe aarppe added the Improvement Expansion or improvement of a current functionality that does already work and meets previous specs label Feb 9, 2020
This was referenced Mar 25, 2020
@eddieantonio eddieantonio added the requires-backend-work Requires work to Python, scripts, automation, etc. label Apr 29, 2020
@aarppe
Copy link
Contributor Author

aarppe commented Sep 30, 2021

@andrewdotn @nienna73 I believe this has now in practice been implemented with or in conjunction with incremental import? Whatever remains might be commented here, but otherwise this and the associated issues can be considered closed?

@aarppe
Copy link
Contributor Author

aarppe commented Feb 1, 2023

The following XFSCRIPT code should generate the normative generator (with morpheme boundaries) and the descriptive analyzer for crk, from the elements existing in giellalt/lang-crk/ - What is needed is the latest full lexicon, in lexicon.tmp.lexc or lexicon.hfst / lexicon.fomabin, the phonological rules, in phonology.xfscript, and the composable version of the spell-relax rules, in spellrelax.compose.hfst.

read lexc src/fst/lexicon.tmp.lexc
# load src/fst/lexicon.hfst
define Morphology

source src/fst/phonology.xfscript
define Phonology

regex ~[ $[ "+Err/Frag" ]];
define removeFragments

regex ~[ $[ "+Err/Orth" ]];
define removeNonStandardForms

regex [ 0 <- "+Err/Orth" ];
define deleteErrOrthTag

regex ~[ $[ [ "+N" | "+V" ] ?* "+Err/Orth" ]];
define removeNonStandardNounVerbForms

regex $[ "+N" | "+V" | "+Ipc" | "+Pron" ];
define selectDictPOS

set flag-is-epsilon ON
regex [ selectDictPOS .o. removeNonStandardForms .o. removeFragments .o. Morphology .o. Phonology ];
save stack generator-gt-dict-norm.hfst
define NormativeGenerator

regex [ [ "<" | ">" | "/" ] -> 0 ];
define removeBoundaries

load src/orthography/spellrelax.compose.hfst
define SpellRelax

regex [ deleteErrOrthTag .o. selectDictPOS .o. removeFragments .o. Morphology .o. Phonology .o. removeBoundaries .o. SpellRelax ];
# regex [ NormativeGenerator .o. removeBoundaries .o. SpellRelax ];
invert net
save stack analyser-gt-dict-desc.hfst
define DescriptiveAnalyser

@nienna73 This should fix at least some obvious accumulated glitches for itwêwina, adding -im- to the most obvious possessed nouns, but there is still some substantial revision I need to complete.

The above could be used to generate the dictionary versions of the two FSTs, while keeping everything the same, and starting from the shared source.

@aarppe
Copy link
Contributor Author

aarppe commented Jul 19, 2023

This requires a description of the steps needed to create updated LEXC source from the various dictionary sources. This is documented here: UAlbertaALTLab/crk-db#108

@fbanados fbanados moved this to To do in Third release Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvement Expansion or improvement of a current functionality that does already work and meets previous specs requires-backend-work Requires work to Python, scripts, automation, etc.
Projects
Status: To do
Development

No branches or pull requests

2 participants