You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-8Lines changed: 17 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@
19
19
20
20
Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed environment.
21
21
22
-
Spark NLP comes with **83000+** pretrained **pipelines** and **models** in more than **200+** languages.
22
+
Spark NLP comes with **100000+** pretrained **pipelines** and **models** in more than **200+** languages.
23
23
It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Image to Text (captioning)**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features).
24
24
25
25
**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Llama-2**, **M2M100**, **BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, **Vision Transformers (ViT)**, **OpenAI Whisper**, **Llama**, **Mistral**, **Phi**, **Qwen2**, and many more not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
@@ -63,7 +63,7 @@ $ java -version
63
63
$ conda create -n sparknlp python=3.7 -y
64
64
$ conda activate sparknlp
65
65
# spark-nlp by default is based on pyspark 3.x
66
-
$ pip install spark-nlp==5.5.3 pyspark==3.3.1
66
+
$ pip install spark-nlp==6.0.0 pyspark==3.3.1
67
67
```
68
68
69
69
In Python console or Jupyter `Python3` kernel:
@@ -129,10 +129,11 @@ For a quick example of using pipelines and models take a look at our official [d
129
129
130
130
### Apache Spark Support
131
131
132
-
Spark NLP *5.5.3* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
132
+
Spark NLP *6.0.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
@@ -157,7 +159,7 @@ Find out more about 4.x `SparkNLP` versions in our official [documentation](http
157
159
158
160
### Databricks Support
159
161
160
-
Spark NLP 5.5.3 has been tested and is compatible with the following runtimes:
162
+
Spark NLP 6.0.0 has been tested and is compatible with the following runtimes:
161
163
162
164
|**CPU**|**GPU**|
163
165
|--------------------|--------------------|
@@ -174,7 +176,7 @@ We are compatible with older runtimes. For a full list check databricks support
174
176
175
177
### EMR Support
176
178
177
-
Spark NLP 5.5.3 has been tested and is compatible with the following EMR releases:
179
+
Spark NLP 6.0.0 has been tested and is compatible with the following EMR releases:
178
180
179
181
|**EMR Release**|
180
182
|--------------------|
@@ -184,6 +186,13 @@ Spark NLP 5.5.3 has been tested and is compatible with the following EMR release
184
186
| emr-7.0.0 |
185
187
| emr-7.1.0 |
186
188
| emr-7.2.0 |
189
+
| emr-7.3.0 |
190
+
| emr-7.4.0 |
191
+
| emr-7.5.0 |
192
+
| emr-7.6.0 |
193
+
| emr-7.7.0 |
194
+
| emr-7.8.0 |
195
+
187
196
188
197
We are compatible with older EMR releases. For a full list check EMR support in our official [documentation](https://sparknlp.org/docs/en/install#emr-support)
189
198
@@ -205,7 +214,7 @@ deployed to Maven central. To add any of our packages as a dependency in your ap
205
214
from our official documentation.
206
215
207
216
If you are interested, there is a simple SBT project for Spark NLP to guide you on how to use it in your
@@ -250,7 +259,7 @@ In Spark NLP we can define S3 locations to:
250
259
251
260
Please check [these instructions](https://sparknlp.org/docs/en/install#s3-integration) from our official documentation.
252
261
253
-
## Document5.5.3
262
+
## Documentation
254
263
255
264
### Examples
256
265
@@ -283,7 +292,7 @@ the Spark NLP library:
283
292
keywords = {Spark, Natural language processing, Deep learning, Tensorflow, Cluster},
284
293
abstract = {Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise.}
0 commit comments