UniversalDependencies
diff --git a/‎_ar/syntax.md
Lines changed: 0 additions & 1 deletion b/‎_ar/syntax.md
Lines changed: 0 additions & 1 deletion
diff --git a/‎_ar/tokenization.md
Lines changed: 0 additions & 1 deletion b/‎_ar/tokenization.md
Lines changed: 0 additions & 1 deletion
diff --git a/‎_be/syntax.md
Lines changed: 0 additions & 1 deletion b/‎_be/syntax.md
Lines changed: 0 additions & 1 deletion
diff --git a/‎_be/tokenization.md
Lines changed: 13 additions & 14 deletions b/‎_be/tokenization.md
Lines changed: 13 additions & 14 deletions
diff --git a/‎_bg/morphology.md
Lines changed: 0 additions & 1 deletion b/‎_bg/morphology.md
Lines changed: 0 additions & 1 deletion
diff --git a/‎_bg/specific-syntax.md
Lines changed: 1 addition & 2 deletions b/‎_bg/specific-syntax.md
Lines changed: 1 addition & 2 deletions
diff --git a/‎_ckb/specific-syntax.md
Lines changed: 1 addition & 2 deletions b/‎_ckb/specific-syntax.md
Lines changed: 1 addition & 2 deletions
diff --git a/‎_cop/morphology.md
Lines changed: 3 additions & 4 deletions b/‎_cop/morphology.md
Lines changed: 3 additions & 4 deletions
diff --git a/‎_cop/specific-syntax.md
Lines changed: 3 additions & 4 deletions b/‎_cop/specific-syntax.md
Lines changed: 3 additions & 4 deletions
diff --git a/‎_cop/tokenization.md
Lines changed: 1 addition & 2 deletions b/‎_cop/tokenization.md
Lines changed: 1 addition & 2 deletions
diff --git a/‎_el/specific-syntax.md
Lines changed: 6 additions & 7 deletions b/‎_el/specific-syntax.md
Lines changed: 6 additions & 7 deletions
diff --git a/‎_el/syntax.md
Lines changed: 0 additions & 12 deletions b/‎_el/syntax.md
Lines changed: 0 additions & 12 deletions
@@ -1,7 +1,6 @@
 ---
 layout: base
 title:  'Syntax'
-permalink: ar/overview/syntax.html
 ---
 
 # Syntax
 
@@ -1,7 +1,6 @@
 ---
 layout: base
 title:  'Tokenization'
-permalink: ar/overview/tokenization.html
 ---
 
 # Tokenization
 
@@ -1,7 +1,6 @@
 ---
 layout: base
 title:  'Syntax'
-permalink: be/overview/syntax.html
 ---
 
 # Syntax
 
@@ -1,7 +1,6 @@
 ---
 layout: base
 title:  'Tokenization'
-permalink: be/overview/tokenization.html
 ---
 
 # Tokenization
@@ -11,18 +10,18 @@ The low-level tokenization of the Belarusian UD treebank generally adopts the RN
 * In general, tokens are delimited by whitespace. The regexp [А-zА-яЁёУўі\-]+ usually corresponds to one token.
 * Punctuation (recognized by the corresponding Unicode property) that is conventionally written adjacent to the preceding or following word is separated during tokenization.
 * Each punctuation mark is treated as a single token, e.g. the following sequence: <b>)", -</b> becomes four tokens, <b>)</b> , <b>"</b>, <b>,</b>, and <b>-"</b>. Exceptions are conventional multi-character punctuation marks: <b>--</b> , <b>...</b> , <b>?!</b> ,  etc., and emojis and smileys: <b>:)</b> , <b>^_^</b>, etc.
-* Conventional non-cyrillic multi-character terms are tokenized as single tokens: <b>°С</b>, <b>км2</b>. 
-
-Some special cases worth mentioning: 
-* Numerical expressions including decimal numbers, such as <b>245</b>, <b>3,14</b>, are treated as single tokens. 
-* Time expressions like <b>20:55</b> are splitted into separate tokens (in this case, three { <b>20</b> , <b>:</b> , <b>55</b> }). 
-* Dates like <b>20.04.2012</b> are splitted into separate tokens (in this case, five { <b>20</b> , <b>.</b> , <b>04</b> , <b>.</b> , <b>2012</b> }). 
-* Special symbols before and after numerical expressions, as in <b>$500</b> , <b>2,67%</b> , <b>+27°С</b> , are tokenised separately (so, the tokens are { <b>$</b> , <b>500</b> } , { <b>2,67</b> , <b>%</b> } , { <b>+</b> , <b>27</b> , <b>°С</b> }). 
-* Numerical expressions with hyphen and cyrillic endings (e.g. <b>1-ый</b> “1st”, <b>3-м</b> “3rd.Ins”) as well as adjectives and other non-numerals which contain digits (e.g. <b>79-гадовы</b> “79 year old”, <b>500-годдзе</b> “500th anniversary”) are treated as single tokens. 
-* Other words with hyphen are treated as single tokens, except for the cases then the first part is inflected. Examples: { <b>з-за</b> } “because of”, { <b>зялёна-шэрых</b> } “green-gray”, { <b>Санкт-Пецярбург</b> } “St. Petersburg”, but { <b>Ростове</b> , <b>-</b> , <b>на</b> , <b>-</b> , <b>Дону</b>} “(in) Rostov on Don”. 
+* Conventional non-cyrillic multi-character terms are tokenized as single tokens: <b>°С</b>, <b>км2</b>.
+
+Some special cases worth mentioning:
+* Numerical expressions including decimal numbers, such as <b>245</b>, <b>3,14</b>, are treated as single tokens.
+* Time expressions like <b>20:55</b> are splitted into separate tokens (in this case, three { <b>20</b> , <b>:</b> , <b>55</b> }).
+* Dates like <b>20.04.2012</b> are splitted into separate tokens (in this case, five { <b>20</b> , <b>.</b> , <b>04</b> , <b>.</b> , <b>2012</b> }).
+* Special symbols before and after numerical expressions, as in <b>$500</b> , <b>2,67%</b> , <b>+27°С</b> , are tokenised separately (so, the tokens are { <b>$</b> , <b>500</b> } , { <b>2,67</b> , <b>%</b> } , { <b>+</b> , <b>27</b> , <b>°С</b> }).
+* Numerical expressions with hyphen and cyrillic endings (e.g. <b>1-ый</b> “1st”, <b>3-м</b> “3rd.Ins”) as well as adjectives and other non-numerals which contain digits (e.g. <b>79-гадовы</b> “79 year old”, <b>500-годдзе</b> “500th anniversary”) are treated as single tokens.
+* Other words with hyphen are treated as single tokens, except for the cases then the first part is inflected. Examples: { <b>з-за</b> } “because of”, { <b>зялёна-шэрых</b> } “green-gray”, { <b>Санкт-Пецярбург</b> } “St. Petersburg”, but { <b>Ростове</b> , <b>-</b> , <b>на</b> , <b>-</b> , <b>Дону</b>} “(in) Rostov on Don”.
 * Abbreviations are treated as single tokens, whitespaces split the abbreviations.
 * Abbreviations marked by a period, as in <b>стр.</b> “p. (page)”, <b>П.</b> “P. (for Peter)”, are treated as single tokens. If the period overlaps with the end of sentence period then it is written once as a separate token (denoting end-of-sentence), e.g. { <b>1914</b> , <b>г</b> , <b>.</b> } “year 1914”.
-* Abbreviations can not contain a period inside, i.e. the patterns like <b>і т.д.</b> “and so on”, <b>да т.п.</b> “and so forth” are splitted into three tokens: { <b>i</b> , <b>т.</b> , <b>д.</b> }, { <b>да</b> , <b>т.</b> , <b>п.</b> }. 
+* Abbreviations can not contain a period inside, i.e. the patterns like <b>і т.д.</b> “and so on”, <b>да т.п.</b> “and so forth” are splitted into three tokens: { <b>i</b> , <b>т.</b> , <b>д.</b> }, { <b>да</b> , <b>т.</b> , <b>п.</b> }.
 * Email addresses, URLs, and tweet-style names are treated as single tokens: {[email protected]}, {https://github.com}, {@anna_li}
 
 The Belarusian UD treebank does not contain multiword tokens.
@@ -35,11 +34,11 @@ The Belarusian UD treebank does not contain multiword tokens.
 
 ### Verb forms, analytical grammatical forms, negation
 
-* reflexive verbs are kept as a single token (orthographic rule): <b>з'яўляецца</b> “is”. 
+* reflexive verbs are kept as a single token (orthographic rule): <b>з'яўляецца</b> “is”.
 * the forms of subjunctive mood, analytical passive, complex future tense, complex comparatives, etc. are splitted
-according to the orthographic principle: { <b>маглі</b> , <b>б</b> } “would be able, could”, { <b>былі</b> , <b>зафіксаваныя</b> } “were recorded”, { <b>будзе</b> , <b>трымацца</b> } “will be held”, { <b>больш</b> , <b>сур'ёзныя</b> } “more serious” 
+according to the orthographic principle: { <b>маглі</b> , <b>б</b> } “would be able, could”, { <b>былі</b> , <b>зафіксаваныя</b> } “were recorded”, { <b>будзе</b> , <b>трымацца</b> } “will be held”, { <b>больш</b> , <b>сур'ёзныя</b> } “more serious”
 * <b>не</b> and <b>ни</b> used as negation markers with verbs, pronouns and other words are tokenized according to the orthographic rules: { <b>не</b> , <b>рэагуючы</b> } “not reacting”, { <b>ні</b> , <b>ў</b> , <b>каго</b> } “at no one”, but { <b>непапраўнай</b> } “irrecoverable”, { <b>незавершаны</b> } “not finished”, { <b>ніякіх</b> } “no one”.
-* паў- and напаў- “half” are never kept separate: <b>паўбеспрацоўны</b> “half-unemployed”, <b>напаўзабыты</b> “half-forgotten”. 
+* паў- and напаў- “half” are never kept separate: <b>паўбеспрацоўны</b> “half-unemployed”, <b>напаўзабыты</b> “half-forgotten”.
 
 ### Character set
 
 
@@ -1,7 +1,6 @@
 ---
 layout: base
 title:  'Morphology'
-permalink: bg/overview/morphology.html
 ---
 
 # Morphology
 
@@ -1,12 +1,11 @@
 ---
 layout: base
 title:  'Syntax'
-permalink: bg/overview/specific-syntax.html
 ---
 
 # Specific constructions
 
-## Yes-No question particle 
+## Yes-No question particle
 
 In Bulgarian the Yes-No questions are formed with the question particle ли (li). At the moment this particle is annotated with the [cs-dep/discourse]() relation.
 
 
@@ -1,7 +1,6 @@
 ---
 layout: base
 title:  'Syntax'
-permalink: ckb/overview/specific-syntax.html
 ---
 
 # Specific constructions
@@ -14,6 +13,6 @@ We do not split off possessive inflection.
 
 ~~~ sdparse
 
-Mindalakanim \n my-children 
+Mindalakanim \n my-children
 
 ~~~
@@ -1,7 +1,6 @@
 ---
 layout: base
 title:  'Morphology'
-permalink: cop/overview/morphology.html
 udver: '2'
 ---
 
@@ -13,7 +12,7 @@ In keeping with other Universal Dependency treebanks, the Coptic dependency tree
 
 |Coptic Scriptorium | Universal Tags|
 |--------------------- |:---------------------|
-|AAOR  | AUX | 
+|AAOR  | AUX |
 |ACAUS | AUX |
 |ACOND | SCONJ |
 |ACONJ | AUX |
@@ -63,12 +62,12 @@ In keeping with other Universal Dependency treebanks, the Coptic dependency tree
 
 **Notes**
 
-The Universal POS tags do not map well onto Coptic tags in several cases; in all instances, the attempt has been made to choose the nearest category, especially with syntactic function in mind. The objective is to create dependency trees that connect similar categories to those of other languages. 
+The Universal POS tags do not map well onto Coptic tags in several cases; in all instances, the attempt has been made to choose the nearest category, especially with syntactic function in mind. The objective is to create dependency trees that connect similar categories to those of other languages.
 
 Most tripartite conjugation bases have been mapped to either auxiliaries (`AUX`), if they are main clause conjugations (past auxiliary APST, aorist AAOR, etc.) or not the main conjugation morpheme (e.g. future marker FUT, which may join a durative conjugation or irrealis preterit). For the subordinate conjugations (APREC, ALIM), the universal tag `SCONJ` (subordinating conjunction) is used.
 
 The category IMOD is cast as a form of `ADV`. While the alternatives of `ADP` (adposition) or `PART` (particle) are semantically appealing, the mapping to `ADV` best represents their sentential function and parallels the dependency label advmod. Note that this results in some adverbs carrying determiners, which is rather odd in terms of underlying categories for the syntax trees. It is perhaps similar to some extent to situations with the Stanford Typed label npadvmod, with the distinction that Coptic IMODs only attach to pronouns, never nouns.
 
-The existential predicates (EXIST) have been mapped as `VERB`, whereas the copula (COP) is mapped to `PTC`, since unlike in the case of existence, it does not contain the actual predicate, and is also absent in the interlocutive patterns. 
+The existential predicates (EXIST) have been mapped as `VERB`, whereas the copula (COP) is mapped to `PTC`, since unlike in the case of existence, it does not contain the actual predicate, and is also absent in the interlocutive patterns.
 
 Finally the converters have been treated similarly to conjugation bases, although they co-occur with the bases. Subordinate converters (CCIRC, CREL) are treated as `SCONJ`, while (potentially) main clause converters (CFOC, CPRET) are tagged as `AUX`. In all cases, we stress that these are not ideal tag assignments, but ones that aim to stay closest to the limited universal tag set’s behavior. For all new projects we recommend using Scriptorium tags and converting automatically to universal tags if necessary.
@@ -1,7 +1,6 @@
 ---
 layout: base
 title:  'Syntax'
-permalink: cop/overview/specific-syntax.html
 udver: '2'
 ---
 
@@ -22,7 +21,7 @@ csubj(ⲉⲝⲉⲥⲧⲓ, ⲁⲁ)
 Greek conjunctions and particles that are non-coordinating (i.e. not meaning ‘and/or’) are labeled as `advmod` to their associated predicate, as in the following example:
 
 ~~~ sdparse
-ⲙⲏ ⲁⲣⲁ ⲉ ⲓ ⲟⲩⲏϩ ⲟⲛ ϩⲓϫⲛ ⲧ ⲙⲏⲧⲉ ⲛ ϫⲱ ⲕ \n After all do I still sit upon the middle of your head? 
+ⲙⲏ ⲁⲣⲁ ⲉ ⲓ ⲟⲩⲏϩ ⲟⲛ ϩⲓϫⲛ ⲧ ⲙⲏⲧⲉ ⲛ ϫⲱ ⲕ \n After all do I still sit upon the middle of your head?
 
 advmod(ⲟⲩⲏϩ, ⲙⲏ)
 ~~~
@@ -34,7 +33,7 @@ Inverted modifiers of the type ⲛⲟϭ ⲛϭⲟⲙ ‘great power’ (lit. a
 ~~~ sdparse
 ⲡⲓ ⲛⲟϭ ⲛ ⲃⲁⲣⲟⲥ \n this great burden
 
-det(ⲛⲟϭ, ⲡⲓ) 
+det(ⲛⲟϭ, ⲡⲓ)
 nmod(ⲛⲟϭ, ⲃⲁⲣⲟⲥ)
 case(ⲃⲁⲣⲟⲥ, ⲛ)
 ~~~
@@ -43,7 +42,7 @@ case(ⲃⲁⲣⲟⲥ, ⲛ)
 
 The independent possessive pronoun ‘that, which is of X, belongs to X’ is analyzed as the head of the phrase, and the possessor is attached as nmod to this:
 
-~~~ sdparse 
+~~~ sdparse
 ⲛⲁ ⲡⲉ ⲭⲣⲓⲥⲧⲟⲥ \n that which is Christ's
 
 nmod(ⲛⲁ, ⲭⲣⲓⲥⲧⲟⲥ)
 
@@ -1,7 +1,6 @@
 ---
 layout: base
 title:  'Tokenization'
-permalink: cop/overview/tokenization.html
 udver: '2'
 ---
 
@@ -17,7 +16,7 @@ For portmanteau tags, tokens which carry a fused portmanteau POS tag receive bot
 
 *Pure universal dependencies*
 
-When using pure dependencies, more ‘lexical’ functions trump more ‘grammatical’ ones, so that examples like ACOND_PPERS are still labeled nsubj, omitting the aux label entirely. This preserves the pure universal dependency tag set. 
+When using pure dependencies, more ‘lexical’ functions trump more ‘grammatical’ ones, so that examples like ACOND_PPERS are still labeled nsubj, omitting the aux label entirely. This preserves the pure universal dependency tag set.
 
 Alternatively, if the intended application of the annotation project supports sub-tokenization, the CoNLL-U format can be used as follows, specifying subtokens/supertokens for fused units:
 
 
@@ -1,43 +1,42 @@
 ---
 layout: base
 title:  'Syntax'
-permalink: el/overview/specific-syntax.html
 ---
 
 ### Free relatives
 
-Free relative clauses are marked as [ccomp](el-dep/ccomp), [csubj](el-dep/csubj), [advcl](el-dep/advcl) and [advcl](el-dep/advcl), depending on their relation to their verbal or nominal head. 
+Free relative clauses are marked as [ccomp](el-dep/ccomp), [csubj](el-dep/csubj), [advcl](el-dep/advcl) and [advcl](el-dep/advcl), depending on their relation to their verbal or nominal head.
 
 ~~~ sdparse
-Για να εντυπωσιάζετε όποιον γνωρίζετε 
+Για να εντυπωσιάζετε όποιον γνωρίζετε
 ccomp(εντυπωσιάζετε, γνωρίζετε)
 dobj(γνωρίζετε, όποιον)
 ~~~
 
 ~~~ sdparse
-Όποιος έφυγε , έχασε 
+Όποιος έφυγε , έχασε
 csubj(έχασε, έφυγε)
 nsubj(έφυγε, Όποιος)
 ~~~
 
 ~~~ sdparse
-Τιμώρησε όποιον μαθητή τον ενοχλούσε 
+Τιμώρησε όποιον μαθητή τον ενοχλούσε
 ccomp(Τιμώρησε, ενοχλούσε)
 nsubj(ενοχλούσε, μαθητή)
 dobj(ενοχλούσε, τον)
 det(μαθητή, όποιον)
 ~~~
 
 ~~~ sdparse
-η ενασχόληση με οποιοδήποτε θέμα σε ενδιαφέρει 
+η ενασχόληση με οποιοδήποτε θέμα σε ενδιαφέρει
 acl(ενασχόληση, ενδιαφέρει)
 dobj(ενδιαφέρει, σε)
 nsubj(ενδιαφέρει, θέμα)
 det(θέμα, οποιοδήποτε)
 ~~~
 
 ~~~ sdparse
-Έλα όποτε ευκαιρήσεις 
+Έλα όποτε ευκαιρήσεις
 advcl(Έλα, ευκαιρήσεις)
 advmod(ευκαιρήσεις, όποτε)
 ~~~