Skip to content

Commit 3b729c3

Browse files
committed
Ancient Greek tokenization documentation.
1 parent f2e553c commit 3b729c3

File tree

1 file changed

+3
-6
lines changed

1 file changed

+3
-6
lines changed

_grc/index.md

+3-6
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,9 @@ udver: '2'
88

99
## Tokenization and Word Segmentation
1010

11-
*
12-
13-
---
14-
**Instruction**: Describe the general rules for delimiting words (for example, based on whitespace and punctuation) and exceptions to these rules. Specify whether words with spaces and/or multiword tokens occur. Include links to further language-specific documentation if available.
15-
16-
---
11+
* In general, words are delimited by whitespace characters. Description of exceptions follows.
12+
* According to typographical rules, many punctuation marks are attached to a neighboring word. We always tokenize them as separate tokens (words).
13+
* There are neither multi-word tokens nor words with spaces.
1714

1815
## Morphology
1916

0 commit comments

Comments
 (0)