Arabic flexionnal morphology generator
The Alyahmor produce a word form from (prefix, lemma, suffix). It has many functionalities:
- Generate word forms from given word and affixes
- Generate all word forms by adding verbal or nominal affixes according to word type
- Generate all affixes combination for verbs or nouns which can be used in morphology analysis.
ู ูุชุจุฉ ุงููุญู ูุฑ ูููููุฏ ุฃุดูุงู ุงูููู ุงุช ู ู (ุงูุฃุตูุ ูุงูุณูุงุจู ูุงูููุงุญู). ููุฎุฏู ูุธุงุฆู ู ุซู:
- ุฅูุดุงุก ุฃุดูุงู ุงูููู ุงุช ู ู ุงูููู ุฉ ูุงูุฒูุงุฆุฏ ุงูู ุนุทุงุฉ
- ุชูููุฏ ุฃุดูุงู ุงูููู ุงุช ุจุฒูุงุฏุฉ ุงูููุงุญู ุงูุงุณู ูุฉ ุฃู ุงููุนููุฉ ููููุง ูููุน ุงูููู ุฉ
- ุชูููุฏ ููุงุฆู ุงูููุงุญู ููุฃูุนุงู ุฃู ุงูุฃุณู ุงุก ูุงุณุชุฎุฏุงู ูุง ูู ุงูุชุญููู ุงูุตุฑูู
Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com
| Features | value |
|---|---|
| Authors | Authors.md |
| Release | 0.2 |
| License | GPL |
| Tracker | linuxscout/alyahmor/Issues |
| Accounts |
If you would cite it in academic work, can you use this citation
T. Zerroukiโ, Alyahmor, Arabic mophological generator Library for python., https://pypi.python.org/pypi/alyahmor/, 2019
or in bibtex format
@misc{zerrouki2019alyahmor,
title={alyahmor, Arabic mophological generator Library for python.},
author={Zerrouki, Taha},
url={https://pypi.python.org/pypi/alyahmor},
year={2019}
}- Text Stemming
- Morphology analysis
- Text Classification and categorization
- Spellchecking
- Arabic word Light Stemming.
- Features:
- Generate word forms from given word and affixes
- Generate all word forms by adding verbal or nominal affixes according to word type
- Generate all affixes combination for verbs or nouns which can be used in morphology analysis.
- Generate Stopwords forms
pip install alyahmor
pip install -r requirements.txt
- libQutrub: Qutrub verb conjugation library: http://pypi.pyton/LibQutrub
- PyArabic: Arabic language tools library : http://pypi.pyton/pyarabic
- Arramooz-pysqlite : Arabic dictionary
ุงูููุญูู ููุฑุ ููู ุงูุญุณู ุจู ุงูู ุนุงูู ุงูุจุงููุงูู ุฃุจู ุนูู ุงููุญูู ุงูุญูู ุดูุฎ ุงูุนุฑุจูุฉ ูู ุฒู ุงูู ูู ุจุบุฏุงุฏ ู ู ุชูุงู ุฐุฉ ุฃุจู ุงูุจูุงุก ุงูุนูุจุฑู ุช ูฆูฃูงูู
ููุชุจ ุจุฎุทู ูุซูุฑุงู ู ู ุงูุฃุฏุจ ูุงููุบุฉ ูุณุงุฆุฑ ุงูููููุ ููุงู ูู ูู ุฉู ุนุงููุฉุ ูุญุฑุตู ุดุฏูุฏุ ูุชุญุตูู ุงูููุงุฆุฏ ู ุน ุนูู ุณููุ ูุถุนู ุจุตุฑูุ ููุซุฑุฉ ู ุญููุธูุ ูุตุฏููุ ูุซูุชูุ ูุชูุงุถุนูุ ููุฑู ุฃุฎูุงูู.
ูุงูุชูู ุขุฎุฑ ุนู ุฑู ุฅูู ู ุฐูุจ ุงูุดุงูุนูุ ูุงูุชูุช ุฅููู ุฑูุงุณุฉ ุงููุญู. ู ููุฏู ุณูุฉ ุซู ุงู ูุณุชูู ูุฎู ุณู ุงุฆุฉุ ูุชููู ุณูุฉ ุณุจุน ูุซูุงุซูู ูุณุชู ุงุฆุฉ. ุงูู ุฒูุฏ ุนู ุงููุญู ูุฑ
It joins word with affixes with suitable correction for example
ุจุงู+ูุชุงุจ +ูู => ุจุงููุชุงุจูู ุจ+ุฃุจูุงุก+ู => ุจุฃุจูุงุฆู
To generate all forms of the word ูุชุงุจ as noun use
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ููุชูุงุจ"
>>> noun_forms = generator.generate_forms( word, word_type="noun")
>>>noun_forms
[u'ุขููููุชูุงุจ', u'ุขููููุชูุงุจุง', u'ุขููููุชูุงุจุงุช', u'ุขููููุชูุงุจุงู', u'ุขููููุชูุงุจุฉ', u'ุขููููุชูุงุจุชุงู', u'ุขููููุชูุงุจุชูู', u'ุขููููุชูุงุจูู', u'ุขููููุชูุงุจู', u'ุขููููุชูุงุจูุงุช'
....]To generate all forms of the word ูุชุงุจ as verb, use
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ุงุณุชุนู
ู"
>>> verb_forms = generator.generate_forms( word, word_type="verb")
>>>verb_forms
[u'ุฃูุฃูุณูุชูุนูู
ููู', u'ุฃูุฃูุณูุชูุนูู
ููููู', u'ุฃูุฃูุณูุชูุนูู
ูููููู
ูุง', u'ุฃูุฃูุณูุชูุนูู
ูููููู
ู', u'ุฃูุฃูุณูุชูุนูู
ูููููููู', u'ุฃูุฃูุณูุชูุนูู
ูููููุง', u'ุฃูุฃูุณูุชูุนูู
ูููููู', u'ุฃูุฃูุณูุชูุนูู
ูููููู', u'ุฃูุฃูุณูุชูุนูู
ูููููููู', u'ุฃูุฃูุณูุชูุนูู
ููููููููู
ูุง',
....]To generate all forms of the word ุฅูู as stopword, use
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = "ุฅูู"
>>> stop_forms = generator.generate_forms( word, word_type="stopword")
>>> stop_forms
['ุฃูุฅูููู', 'ุฃูุฅูููููู', 'ุฃูุฅููููููู', 'ุฃูุฅูููููููู
ูุง', 'ุฃูุฅูููููููู
ู', 'ุฃูุฅูููููููููู', 'ุฃูุฅููููููู', 'ุฃูุฅูููููููุง',
....]To generate all forms of the word ูุชุงุจ as noun without vocalization use
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ููุชูุงุจ"
>>> noun_forms = generator.generate_forms( word, word_type="noun", vocalized=False)
>>>noun_forms
[u'ุขููุชุงุจ', u'ุขููุชุงุจุง', u'ุขููุชุงุจุงุช', u'ุขููุชุงุจุงู', u'ุขููุชุงุจุฉ', u'ุขููุชุงุจุชุงู', u'ุขููุชุงุจุชูู', u'ุขููุชุงุจูู', u'ุขููุชุงุจู', u'ุขููุชุงุจูุงุช',
....]To generate all forms of the word ูุชุงุจ as noun as a dict of grouped all vocalized forms by unvocalized form use
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ููุชูุงุจ"
>>> noun_forms = generator.generate_forms( word, word_type="noun", indexed=True)
>>>noun_forms
{u'ุฃููุชุงุจุฉ': [u'ุฃููููุชููุงุจูุฉู', u'ุฃููููุชููุงุจูุฉู'],
u'ุฃูููุชุงุจุฉ': [u'ุฃูููููููุชููุงุจูุฉู', u'ุฃูููููููุชููุงุจูุฉู'],
u'ููุชุงุจูุงุชูู
': [u'ููููุชููุงุจูุงุชููู
ู', u'ููููุชููุงุจูููุงุชูููู
ู', u'ููููุชููุงุจูููุงุชูููู
ู', u'ููููุชููุงุจูููุงุชูููู
ู', u'ููููุชููุงุจูุงุชููู
ู'],
u'ููุชุงุจูุงุชูู': [u'ููููุชููุงุจูุงุชููููู', u'ููููุชููุงุจูุงุชููููู', u'ููููุชููุงุจูููุงุชูููููู', u'ููููุชููุงุจูููุงุชูููููู', u'ููููุชููุงุจูููุงุชูููููู'],
u'ููููุชุงุจุงุช': [u'ููููููููุชููุงุจูุงุชู', u'ููููููููุชููุงุจุงุช'],
u'ุฃุจูุชุงุจุชูู': [u'ุฃูุจูููุชููุงุจูุชูููููู'],
u'ุฃุจูุชุงุจุชูู
': [u'ุฃูุจูููุชููุงุจูุชูููู
ู'],
u'ุฃูุชุงุจูุงุชูู': [u'ุฃูููุชููุงุจูุงุชููููู', u'ุฃูููุชููุงุจูููุงุชูููููู', u'ุฃูููุชููุงุจูุงุชููููู', u'ุฃูููุชููุงุจูููุงุชูููููู', u'ุฃูููุชููุงุจูููุงุชูููููู'],
u'ููุชุงุจุงุชูู
': [u'ููููุชููุงุจุงุชููู
ู', u'ููููุชููุงุจูุงุชูููู
ู', u'ููููุชููุงุจูุงุชูููู
ู', u'ููููุชููุงุจุงุชููู
ู', u'ููููุชููุงุจูุงุชูููู
ู'],
u'ุจูุชุงุจูุงุชูู': [u'ุจูููุชููุงุจูููุงุชูููููู', u'ุจูููุชููุงุจูุงุชููููู'],
....
}The detailled form contains
- vocalized word form, example: "ูููุชูุงุจูุงุชูููุง"
- semi-vocalized: the word without case mark (ุฏูู ุนูุงู ุฉ ุงูุฅุนุฑุงุจ), example: "ูููุชูุงุจูุงุชููุง"
- segmented form: the affix parts and the word like : procletic-prefix-word-suffix-proclitic, for example : ู--ูุชุงุจ-ุงุช-ูุง
- Tags : ุนุทู:ุฌู ุน ู ุคูุซ ุณุงูู :ุถู ูุฑ ู ุชุตู
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ููุชูุงุจ"
noun_forms = generator.generate_forms( word, word_type="noun", indexed=True, details=True)
>>> noun_forms
[{'vocolized': 'ุงุณุชุนู
ู', 'semi-vocalized': 'ุงุณุชุนู
ู', 'segmented': '-ุงุณุชุนู
ู--', 'tags': '::'},
{'vocolized': 'ุงุณุชุนู
ูู', 'semi-vocalized': 'ุงุณุชุนู
ูู', 'segmented': '-ุงุณุชุนู
ู--ู', 'tags': ':ู
ุถุงู:'},
{'vocolized': 'ุงุณุชุนู
ููู', 'semi-vocalized': 'ุงุณุชุนู
ููู', 'segmented': '-ุงุณุชุนู
ู--ู', 'tags': ':ู
ุถุงู:'},
{'vocolized': 'ุงุณุชุนู
ููู', 'semi-vocalized': 'ุงุณุชุนู
ููู', 'segmented': '-ุงุณุชุนู
ู--ู', 'tags': ':ู
ุถุงู:'},
{'vocolized': 'ุงุณุชุนู
ููู', 'semi-vocalized': 'ุงุณุชุนู
ููู', 'segmented': '-ุงุณุชุนู
ู--ู', 'tags': ':ู
ุถุงู:'},
{'vocolized': 'ุงุณุชุนู
ููู', 'semi-vocalized': 'ุงุณุชุนู
ููู', 'segmented': '-ุงุณุชุนู
ู--ู', 'tags': ':ู
ุถุงู:'},
{'vocolized': 'ุงุณุชุนู
ูููู
ู', 'semi-vocalized': 'ุงุณุชุนู
ูููู
ู', 'segmented': '-ุงุณุชุนู
ู--ูู
', 'tags': ':ู
ุถุงู:'},
....]Alyahmor generate affixes listes for verbs and nouns
>>> verb_affix =generator.generate_affix_list(word_type="verb", vocalized=True)
>>>verb_affix
[u'ุฃูููุณูุช-ููููู', u'ุฃู-ูููุง', u'ู-ูููู', u'ูููู-ุชุงูู', u'ูููููู-ููููู', u'ุฃูุช-ููููุง', u'ูู-ุงููููู', u'ู-ูููุง', u'ููุช-ูููุง', u'ุฃูู-ูููู
ูุง', ....]
>>> noun_affix =generator.generate_affix_list(word_type="noun", vocalized=True)
>>> noun_affix
[u'ุฃูู-ูุงุชูู', u'ูู-ูููุงุชูููู
ูุง', u'ุฃูู-ูุงุชูู', u'ุฃููููู-ููููุง', u'ุฃููู-ููููููููู', u'ุฃููู-ูููู
ูุง', u'ุฃููู-ููููุชูููู
ู', u'ุฃููููู-ูุงุชููู
ู', u'ููุจู-ููููููู
ู', u'ูููู-ููููุชูููุง', ....]Generate Unvocalized affixes
>>> noun_affix =generator.generate_affix_list(word_type="noun", vocalized=False)
>>> noun_affix
[u'-', u'-ุง', u'-ุงุช', u'-ุงุชู', u'-ุงุชูู
', u'-ุงุชูู
ุง', u'-ุงุชูู', u'-ุงุชูุง', u'-ุงุชู', u'-ุงุชูุง', ...]Alyahmor generate word forms for given affixes
- the affix parameter is a list which contains four elements as
- procletic
- prefix
- suffix
- enclitic
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ููุชูุงุจ"
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"ุจุงู", u"", u"ูู", u""])
['ุจูุงููููุชููุงุจูู']
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"ูู", u"", u"ู", u""])
['ููููููุชููุงุจู']
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"ู", u"", u"", u""])
['ููููุชููุงุจ']
- file/directory category description
tests/samples/dataset.csv A list of verified affixes
