Skip to content

linuxscout/alyahmor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

48 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Alyahmor ุงู„ูŠุญู…ูˆุฑ

Arabic flexionnal morphology generator

Alyahmor

Description

The Alyahmor produce a word form from (prefix, lemma, suffix). It has many functionalities:

  • Generate word forms from given word and affixes
  • Generate all word forms by adding verbal or nominal affixes according to word type
  • Generate all affixes combination for verbs or nouns which can be used in morphology analysis.

ู…ูƒุชุจุฉ ุงู„ูŠุญู…ูˆุฑ ูŠููˆู„ู‘ุฏ ุฃุดูƒุงู„ ุงู„ูƒู„ู…ุงุช ู…ู† (ุงู„ุฃุตู„ุŒ ูˆุงู„ุณูˆุงุจู‚ ูˆุงู„ู„ูˆุงุญู‚). ูˆูŠุฎุฏู… ูˆุธุงุฆู ู…ุซู„:

  • ุฅู†ุดุงุก ุฃุดูƒุงู„ ุงู„ูƒู„ู…ุงุช ู…ู† ุงู„ูƒู„ู…ุฉ ูˆุงู„ุฒูˆุงุฆุฏ ุงู„ู…ุนุทุงุฉ
  • ุชูˆู„ูŠุฏ ุฃุดูƒุงู„ ุงู„ูƒู„ู…ุงุช ุจุฒูŠุงุฏุฉ ุงู„ู„ูˆุงุญู‚ ุงู„ุงุณู…ูŠุฉ ุฃูˆ ุงู„ูุนู„ูŠุฉ ูˆูู‚ู‹ุง ู„ู†ูˆุน ุงู„ูƒู„ู…ุฉ
  • ุชูˆู„ูŠุฏ ู‚ูˆุงุฆู… ุงู„ู„ูˆุงุญู‚ ู„ู„ุฃูุนุงู„ ุฃูˆ ุงู„ุฃุณู…ุงุก ู„ุงุณุชุฎุฏุงู…ู‡ุง ููŠ ุงู„ุชุญู„ูŠู„ ุงู„ุตุฑููŠ

Developpers:

Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com

Features value
Authors Authors.md
Release 0.2
License GPL
Tracker linuxscout/alyahmor/Issues
Accounts @Twitter

Citation

If you would cite it in academic work, can you use this citation

T. Zerroukiโ€, Alyahmor, Arabic mophological  generator Library for python.,  https://pypi.python.org/pypi/alyahmor/, 2019

or in bibtex format

@misc{zerrouki2019alyahmor,
  title={alyahmor, Arabic mophological generator Library for python.},
  author={Zerrouki, Taha},
  url={https://pypi.python.org/pypi/alyahmor},
  year={2019}
}

Applications

  • Text Stemming
  • Morphology analysis
  • Text Classification and categorization
  • Spellchecking

Features ู…ุฒุงูŠุง

  • Arabic word Light Stemming.
  • Features:
    • Generate word forms from given word and affixes
    • Generate all word forms by adding verbal or nominal affixes according to word type
    • Generate all affixes combination for verbs or nouns which can be used in morphology analysis.
    • Generate Stopwords forms

Installation

pip install alyahmor

Requirements

pip install -r requirements.txt 

ุฃุตู„ ุงู„ุชุณู…ูŠุฉ

ุงู„ูŠูŽุญู’ู…ููˆุฑุŒ ูˆู‡ูˆ ุงู„ุญุณู† ุจู† ุงู„ู…ุนุงู„ูŠ ุงู„ุจุงู‚ู„ุงู†ูŠ ุฃุจูˆ ุนู„ูŠ ุงู„ู†ุญูˆูŠ ุงู„ุญู„ูŠ ุดูŠุฎ ุงู„ุนุฑุจูŠุฉ ููŠ ุฒู…ุงู†ู‡ ููŠ ุจุบุฏุงุฏ ู…ู† ุชู„ุงู…ุฐุฉ ุฃุจูŠ ุงู„ุจู‚ุงุก ุงู„ุนูƒุจุฑูŠ ุช ูฆูฃูงู‡ู€

ูˆูƒุชุจ ุจุฎุทู‡ ูƒุซูŠุฑุงู‹ ู…ู† ุงู„ุฃุฏุจ ูˆุงู„ู„ุบุฉ ูˆุณุงุฆุฑ ุงู„ูู†ูˆู†ุŒ ูˆูƒุงู† ู„ู‡ ู‡ู…ุฉูŒ ุนุงู„ูŠุฉุŒ ูˆุญุฑุตูŒ ุดุฏูŠุฏุ› ูˆุชุญุตูŠู„ ุงู„ููˆุงุฆุฏ ู…ุน ุนู„ูˆ ุณู†ู‡ุŒ ูˆุถุนู ุจุตุฑู‡ุŒ ูˆูƒุซุฑุฉ ู…ุญููˆุธู‡ุŒ ูˆุตุฏู‚ู‡ุŒ ูˆุซู‚ุชู‡ุŒ ูˆุชูˆุงุถุนู‡ุŒ ูˆูƒุฑู… ุฃุฎู„ุงู‚ู‡.

ูˆุงู†ุชู‚ู„ ุขุฎุฑ ุนู…ุฑู‡ ุฅู„ู‰ ู…ุฐู‡ุจ ุงู„ุดุงูุนูŠุŒ ูˆุงู†ุชู‡ุช ุฅู„ูŠู‡ ุฑูŠุงุณุฉ ุงู„ู†ุญูˆ. ู…ูˆู„ุฏู‡ ุณู†ุฉ ุซู…ุงู† ูˆุณุชูŠู† ูˆุฎู…ุณู…ุงุฆุฉุŒ ูˆุชูˆููŠ ุณู†ุฉ ุณุจุน ูˆุซู„ุงุซูŠู† ูˆุณุชู…ุงุฆุฉ. ุงู„ู…ุฒูŠุฏ ุนู† ุงู„ูŠุญู…ูˆุฑ

Usage

Example

Generate words forms

It joins word with affixes with suitable correction for example

ุจุงู„+ูƒุชุงุจ +ูŠู† => ุจุงู„ูƒุชุงุจูŠู† ุจ+ุฃุจู†ุงุก+ู‡ => ุจุฃุจู†ุงุฆู‡

Nouns

To generate all forms of the word ูƒุชุงุจ as noun use

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ูƒูุชูุงุจ"
>>> noun_forms = generator.generate_forms( word, word_type="noun")
>>>noun_forms
[u'ุขู„ู’ูƒูุชูŽุงุจ', u'ุขู„ู’ูƒูุชูŽุงุจุง', u'ุขู„ู’ูƒูุชูŽุงุจุงุช', u'ุขู„ู’ูƒูุชูŽุงุจุงู†', u'ุขู„ู’ูƒูุชูŽุงุจุฉ', u'ุขู„ู’ูƒูุชูŽุงุจุชุงู†', u'ุขู„ู’ูƒูุชูŽุงุจุชูŠู†', u'ุขู„ู’ูƒูุชูŽุงุจูˆู†', u'ุขู„ู’ูƒูุชูŽุงุจูŠ', u'ุขู„ู’ูƒูุชูŽุงุจูŠุงุช'
....]

Verbs

To generate all forms of the word ูƒุชุงุจ as verb, use

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ุงุณุชุนู…ู„"
>>> verb_forms = generator.generate_forms( word, word_type="verb")
>>>verb_forms
[u'ุฃูŽุฃูŽุณู’ุชูŽุนู’ู…ูู„ูŽ', u'ุฃูŽุฃูŽุณู’ุชูŽุนู’ู…ูู„ูŽูƒูŽ', u'ุฃูŽุฃูŽุณู’ุชูŽุนู’ู…ูู„ูŽูƒูู…ูŽุง', u'ุฃูŽุฃูŽุณู’ุชูŽุนู’ู…ูู„ูŽูƒูู…ู’', u'ุฃูŽุฃูŽุณู’ุชูŽุนู’ู…ูู„ูŽูƒูู†ู‘ูŽ', u'ุฃูŽุฃูŽุณู’ุชูŽุนู’ู…ูู„ูŽู†ูŽุง', u'ุฃูŽุฃูŽุณู’ุชูŽุนู’ู…ูู„ูŽู†ููŠ', u'ุฃูŽุฃูŽุณู’ุชูŽุนู’ู…ูู„ูŽู†ู‘ูŽ', u'ุฃูŽุฃูŽุณู’ุชูŽุนู’ู…ูู„ูŽู†ู‘ูŽูƒูŽ', u'ุฃูŽุฃูŽุณู’ุชูŽุนู’ู…ูู„ูŽู†ู‘ูŽูƒูู…ูŽุง', 

....]

Stop words

To generate all forms of the word ุฅู„ู‰ as stopword, use

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = "ุฅู„ู‰"
>>> stop_forms = generator.generate_forms( word, word_type="stopword")
>>> stop_forms
['ุฃูŽุฅูู„ูŽู‰', 'ุฃูŽุฅูู„ูŽูŠูŠู‘', 'ุฃูŽุฅูู„ูŽูŠู’ูƒูŽ', 'ุฃูŽุฅูู„ูŽูŠู’ูƒูู…ูŽุง', 'ุฃูŽุฅูู„ูŽูŠู’ูƒูู…ู’', 'ุฃูŽุฅูู„ูŽูŠู’ูƒูู†ู‘ูŽ', 'ุฃูŽุฅูู„ูŽูŠู’ูƒู', 'ุฃูŽุฅูู„ูŽูŠู’ู†ูŽุง',
....]

Generate non vocalized forms

To generate all forms of the word ูƒุชุงุจ as noun without vocalization use

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ูƒูุชูุงุจ"
>>> noun_forms = generator.generate_forms( word, word_type="noun", vocalized=False)
>>>noun_forms
[u'ุขู„ูƒุชุงุจ', u'ุขู„ูƒุชุงุจุง', u'ุขู„ูƒุชุงุจุงุช', u'ุขู„ูƒุชุงุจุงู†', u'ุขู„ูƒุชุงุจุฉ', u'ุขู„ูƒุชุงุจุชุงู†', u'ุขู„ูƒุชุงุจุชูŠู†', u'ุขู„ูƒุชุงุจูˆู†', u'ุขู„ูƒุชุงุจูŠ', u'ุขู„ูƒุชุงุจูŠุงุช',
....]

Generate a dictionary of vocalized forms indexed by unvocalized form

To generate all forms of the word ูƒุชุงุจ as noun as a dict of grouped all vocalized forms by unvocalized form use

>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ูƒูุชูุงุจ"
>>> noun_forms = generator.generate_forms( word, word_type="noun", indexed=True)
>>>noun_forms
{u'ุฃูƒูƒุชุงุจุฉ': [u'ุฃูƒูŽูƒูุชููŽุงุจูŽุฉู', u'ุฃูƒูŽูƒูุชููŽุงุจูŽุฉู'],
 u'ุฃูˆูƒูƒุชุงุจุฉ': [u'ุฃูŽูˆูŽูƒูŽูƒูุชููŽุงุจูŽุฉู', u'ุฃูŽูˆูŽูƒูŽูƒูุชููŽุงุจูŽุฉู'],
 u'ูˆูƒุชุงุจูŠุงุชู‡ู…': [u'ูˆูŽูƒูุชููŽุงุจูŠุงุชู‡ูู…ู’', u'ูˆูŽูƒูุชููŽุงุจููŠูŽุงุชูู‡ูู…ู’', u'ูˆูŽูƒูุชููŽุงุจููŠูŽุงุชูู‡ูู…ู’', u'ูˆูŽูƒูุชููŽุงุจููŠูŽุงุชูู‡ูู…ู’', u'ูˆูŽูƒูุชููŽุงุจูŠุงุชู‡ูู…ู’'],
 u'ูˆูƒุชุงุจูŠุงุชู‡ู†': [u'ูˆูŽูƒูุชููŽุงุจูŠุงุชู‡ูู†ู‘ูŽ', u'ูˆูŽูƒูุชููŽุงุจูŠุงุชู‡ูู†ู‘ูŽ', u'ูˆูŽูƒูุชููŽุงุจููŠูŽุงุชูู‡ูู†ู‘ูŽ', u'ูˆูŽูƒูุชููŽุงุจููŠูŽุงุชูู‡ูู†ู‘ูŽ', u'ูˆูŽูƒูุชููŽุงุจููŠูŽุงุชูู‡ูู†ู‘ูŽ'],
 u'ูˆู„ู„ูƒุชุงุจุงุช': [u'ูˆูŽู„ูู„ู’ูƒูุชููŽุงุจูŽุงุชู', u'ูˆูŽู„ูู„ู’ูƒูุชููŽุงุจุงุช'],
 u'ุฃุจูƒุชุงุจุชูƒู†': [u'ุฃูŽุจููƒูุชููŽุงุจูŽุชููƒูู†ู‘ูŽ'],
 u'ุฃุจูƒุชุงุจุชูƒู…': [u'ุฃูŽุจููƒูุชููŽุงุจูŽุชููƒูู…ู’'],
 u'ุฃูƒุชุงุจูŠุงุชู‡ู†': [u'ุฃูŽูƒูุชููŽุงุจูŠุงุชู‡ูู†ู‘ูŽ', u'ุฃูŽูƒูุชููŽุงุจููŠูŽุงุชูู‡ูู†ู‘ูŽ', u'ุฃูŽูƒูุชููŽุงุจูŠุงุชู‡ูู†ู‘ูŽ', u'ุฃูŽูƒูุชููŽุงุจููŠูŽุงุชูู‡ูู†ู‘ูŽ', u'ุฃูŽูƒูุชููŽุงุจููŠูŽุงุชูู‡ูู†ู‘ูŽ'],
 u'ููƒุชุงุจุงุชู‡ู…': [u'ููŽูƒูุชููŽุงุจุงุชู‡ูู…ู’', u'ููŽูƒูุชููŽุงุจูŽุงุชูู‡ูู…ู’', u'ููŽูƒูุชููŽุงุจูŽุงุชูู‡ูู…ู’', u'ููŽูƒูุชููŽุงุจุงุชู‡ูู…ู’', u'ููŽูƒูุชููŽุงุจูŽุงุชูู‡ูู…ู’'],
 u'ุจูƒุชุงุจูŠุงุชูƒู†': [u'ุจููƒูุชููŽุงุจููŠูŽุงุชููƒูู†ู‘ูŽ', u'ุจููƒูุชููŽุงุจูŠุงุชูƒูู†ู‘ูŽ'],
....
}

Generate detailled forms

The detailled form contains

  • vocalized word form, example: "ููƒูุชูŽุงุจูŽุงุชูู†ูŽุง"
  • semi-vocalized: the word without case mark (ุฏูˆู† ุนู„ุงู…ุฉ ุงู„ุฅุนุฑุงุจ), example: "ููƒูุชูŽุงุจูŽุงุชู†ูŽุง"
  • segmented form: the affix parts and the word like : procletic-prefix-word-suffix-proclitic, for example : ูˆ--ูƒุชุงุจ-ุงุช-ู†ุง
  • Tags : ุนุทู:ุฌู…ุน ู…ุคู†ุซ ุณุงู„ู…:ุถู…ูŠุฑ ู…ุชุตู„
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ูƒูุชูุงุจ"
noun_forms = generator.generate_forms( word, word_type="noun", indexed=True, details=True)
>>> noun_forms
  [{'vocolized': 'ุงุณุชุนู…ู„', 'semi-vocalized': 'ุงุณุชุนู…ู„', 'segmented': '-ุงุณุชุนู…ู„--', 'tags': '::'}, 
  {'vocolized': 'ุงุณุชุนู…ู„ูŠ', 'semi-vocalized': 'ุงุณุชุนู…ู„ูŠ', 'segmented': '-ุงุณุชุนู…ู„--ูŠ', 'tags': ':ู…ุถุงู:'},
  {'vocolized': 'ุงุณุชุนู…ู„ููŠ', 'semi-vocalized': 'ุงุณุชุนู…ู„ููŠ', 'segmented': '-ุงุณุชุนู…ู„--ูŠ', 'tags': ':ู…ุถุงู:'},
  {'vocolized': 'ุงุณุชุนู…ู„ูƒู', 'semi-vocalized': 'ุงุณุชุนู…ู„ูƒู', 'segmented': '-ุงุณุชุนู…ู„--ูƒ', 'tags': ':ู…ุถุงู:'}, 
  {'vocolized': 'ุงุณุชุนู…ู„ูƒูŽ', 'semi-vocalized': 'ุงุณุชุนู…ู„ูƒูŽ', 'segmented': '-ุงุณุชุนู…ู„--ูƒ', 'tags': ':ู…ุถุงู:'},
   {'vocolized': 'ุงุณุชุนู…ู„ูƒู', 'semi-vocalized': 'ุงุณุชุนู…ู„ูƒู', 'segmented': '-ุงุณุชุนู…ู„--ูƒ', 'tags': ':ู…ุถุงู:'}, 
   {'vocolized': 'ุงุณุชุนู…ู„ูƒูู…ู', 'semi-vocalized': 'ุงุณุชุนู…ู„ูƒูู…ู', 'segmented': '-ุงุณุชุนู…ู„--ูƒู…', 'tags': ':ู…ุถุงู:'}, 
   ....]

Generate affixes lists

Alyahmor generate affixes listes for verbs and nouns

>>> verb_affix =generator.generate_affix_list(word_type="verb", vocalized=True)
>>>verb_affix
[u'ุฃูŽููŽุณูŽุช-ูŠู†ู†ููŠ', u'ุฃูŽ-ูˆู†ูŽุง', u'ูŠ-ูˆู†ูƒูŽ', u'ููŽู„ูŽ-ุชุงูƒูŽ', u'ูˆูŽู„ูŽูŽู†-ู‡ูู†ู‘ูŽ', u'ุฃูŽุช-ูˆู†ู†ูŽุง', u'ูˆูŽ-ุงูƒูู†ู‘ูŽ', u'ู†-ู†ู†ูŽุง', u'ูˆูŽุช-ูˆู‡ูŽุง', u'ุฃูŽูŠ-ู†ู‡ูู…ูŽุง', ....]

>>> noun_affix =generator.generate_affix_list(word_type="noun", vocalized=True)
>>> noun_affix
[u'ุฃูƒูŽ-ูŠุงุชูƒูŽ', u'ููŽ-ููŠูŽุงุชููƒูู…ูŽุง', u'ุฃูƒูŽ-ูŠุงุชูƒู', u'ุฃูŽูˆูŽูƒูŽ-ููŠู†ูŽุง', u'ุฃูŽู„ู-ููŠู‘ูู‡ูู†ู‘ูŽ', u'ุฃูŽููŽ-ูŽูƒูู…ูŽุง', u'ุฃูŽููŽ-ููŠู‘ูŽุชูู‡ูู…ู’', u'ุฃูŽููŽูƒูŽ-ูŠุงุชู‡ูู…ู’', u'ููŽุจู-ููŠู‘ููƒูู…ู’', u'ูˆูŽู„ู-ููŠู‘ูŽุชูู‡ูŽุง', ....]

Generate Unvocalized affixes

>>> noun_affix =generator.generate_affix_list(word_type="noun", vocalized=False)
>>> noun_affix
[u'-', u'-ุง', u'-ุงุช', u'-ุงุชูƒ', u'-ุงุชูƒู…', u'-ุงุชูƒู…ุง', u'-ุงุชูƒู†', u'-ุงุชู†ุง', u'-ุงุชู‡', u'-ุงุชู‡ุง', ...]

Generate word forms by affixes

Alyahmor generate word forms for given affixes

  • the affix parameter is a list which contains four elements as
  • procletic
  • prefix
  • suffix
  • enclitic
>>> import alyahmor.genelex
>>> generator = alyahmor.genelex.genelex()
>>> word = u"ูƒูุชูุงุจ"
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"ุจุงู„", u"", u"ูŠู†", u""])
['ุจูุงู„ู’ูƒูุชููŽุงุจูŠู†']
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"ูˆูƒ", u"", u"ู", u""])
['ูˆูŽูƒูŽูƒูุชููŽุงุจู']
>>> generator.generate_by_affixes( word, word_type="noun", affixes = [u"ูˆ", u"", u"", u""])
['ูˆูŽูƒูุชููŽุงุจ']
 

Files

  • file/directory category description

tests/samples/dataset.csv A list of verified affixes

Featured Posts

About

Arabic flexionnal morphology generator

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages