Skip to content

Commit 95c6f33

Browse files
authored
Exclude parser when running Spacy model
Doesn't load unnecessary components when loading the Spacy sentence segmentation model. This should improve performance. > The SentenceRecognizer is a simple statistical component that only provides sentence boundaries. Along with being faster and smaller than the parser, its primary advantage is that it’s easier to train because it only requires annotated sentence boundaries rather than full dependency parses. spaCy’s trained pipelines include both a parser and a trained sentence segmenter, which is disabled by default. If you only need sentence boundaries and no parser, you can use the exclude or disable argument on spacy.load https://spacy.io/usage/linguistic-features/#sbd-senter
1 parent 275e7f7 commit 95c6f33

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

argostranslate/sbd.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,18 @@ def split_sentences(self, text: str, lang_code: Optional[str] = None) -> List[st
1818

1919
# Spacy sentence boundary detection Sentencizer
2020
# https://community.libretranslate.com/t/sentence-boundary-detection-for-machine-translation/606/3
21+
# https://spacy.io/usage/linguistic-features/#sbd
2122

2223
# Download model:
2324
# python -m spacy download xx_sent_ud_sm
2425
class SpacySentencizerSmall(ISentenceBoundaryDetectionModel):
2526
def __init__(self):
2627
try:
27-
self.nlp = spacy.load("xx_sent_ud_sm")
28+
self.nlp = spacy.load("xx_sent_ud_sm", exclude=["parser"])
2829
except OSError:
30+
# Automatically download the model if it doesn't exist
2931
spacy.cli.download("xx_sent_ud_sm")
30-
self.nlp = spacy.load("xx_sent_ud_sm")
32+
self.nlp = spacy.load("xx_sent_ud_sm", exclude=["parser"])
3133
self.nlp.add_pipe("sentencizer")
3234

3335
def split_sentences(self, text: str, lang_code: Optional[str] = None) -> List[str]:

0 commit comments

Comments
 (0)