Skip to content

Commit 7f05669

Browse files
committed
Bump version to 5.5.0 [run doc]
1 parent ba18698 commit 7f05669

File tree

19 files changed

+277
-1244
lines changed

19 files changed

+277
-1244
lines changed

CHANGELOG

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,26 @@
1+
========
2+
5.5.0
3+
========
4+
----------------
5+
New Features & Enhancements
6+
----------------
7+
* Introduced QWEN2Transformer (#14188)
8+
* Introduced MiniCPM (#14205)
9+
* Introduced NLLB (#14209)
10+
* Implemented Nomic embeddings (#14217)
11+
* Introduced CamemBertForZeroShotClassification annotator (#14354)
12+
* Implemented Mxbai Embeddings (#14355)
13+
* Introduced AlbertForZeroShotClassification (#14361)
14+
* Introduced Phi-3 (#14373)
15+
* Implemented Starcoder2 for causal language modeling (#14358)
16+
* Integrated llama.cpp (#14364)
17+
* Implemented SnowFlake (#14353)
18+
* Introduced ONNX support to vision annotators (#14356)
19+
* Introduced ONNX and OpenVINO support to Missing Annotators (#14359)
20+
* Added OpenVINO install instructions (#14382)
21+
* Exported notebooks for release candidate (#14393)
22+
23+
124
========
225
5.4.2
326
========
@@ -9,7 +32,6 @@ New Features & Enhancements
932
* Added aggressiveMatching parameter to DocumentSimilarityRanker annotator
1033

1134

12-
1335
========
1436
5.4.1
1537
========

README.md

Lines changed: 38 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,31 @@
1717
<img src="https://static.pepy.tech/personalized-badge/spark-nlp?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pip%20downloads" /></a>
1818
</p>
1919

20-
Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed
21-
environment.
22-
Spark NLP comes with **36000+** pretrained **pipelines** and **models** in more than **200+** languages.
20+
Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed environment.
21+
22+
Spark NLP comes with **83000+** pretrained **pipelines** and **models** in more than **200+** languages.
2323
It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Image to Text (captioning)**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features).
2424

25-
**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Llama-2**, **M2M100**, **BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, **Vision Transformers (ViT)**, **OpenAI Whisper**, and many more not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
25+
**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Llama-2**, **M2M100**, **BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, **Vision Transformers (ViT)**, **OpenAI Whisper**, **Llama**, **Mistral**, **Phi**, **Qwen2**, and many more not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
26+
27+
## Model Importing Support
28+
29+
Spark NLP provides easy support for importing models from various popular frameworks:
30+
31+
- **TensorFlow**
32+
- **ONNX**
33+
- **OpenVINO**
34+
- **Llama.cpp (GGUF)**
35+
36+
This wide range of support allows you to seamlessly integrate models from different sources into your Spark NLP workflows, enhancing flexibility and compatibility with existing machine learning ecosystems.
2637

2738
## Project's website
2839

2940
Take a look at our official Spark NLP page: [https://sparknlp.org/](https://sparknlp.org/) for user
3041
documentation and examples
3142

3243
## Features
44+
3345
- [Text Preprocessing](https://sparknlp.org/docs/en/features#text-preproccesing)
3446
- [Parsing and Analysis](https://sparknlp.org/docs/en/features#parsing-and-analysis)
3547
- [Sentiment and Classification](https://sparknlp.org/docs/en/features#sentiment-and-classification)
@@ -51,7 +63,7 @@ $ java -version
5163
$ conda create -n sparknlp python=3.7 -y
5264
$ conda activate sparknlp
5365
# spark-nlp by default is based on pyspark 3.x
54-
$ pip install spark-nlp==5.5.0-rc1 pyspark==3.3.1
66+
$ pip install spark-nlp==5.5.0 pyspark==3.3.1
5567
```
5668

5769
In Python console or Jupyter `Python3` kernel:
@@ -108,6 +120,7 @@ community and we had to build most of the dependencies by ourselves to make them
108120
architectures, however, they may not work in some environments.
109121

110122
## Pipelines and Models
123+
111124
For a quick example of using pipelines and models take a look at our official [documentation](https://sparknlp.org/docs/en/install#pipelines-and-models)
112125

113126
#### Please check out our Models Hub for the full list of [pre-trained models](https://sparknlp.org/models) with examples, demo, benchmark, and more
@@ -116,10 +129,11 @@ For a quick example of using pipelines and models take a look at our official [d
116129

117130
### Apache Spark Support
118131

119-
Spark NLP *5.5.0-rc1* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
132+
Spark NLP *5.5.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
120133

121134
| Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
122135
|-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
136+
| 5.5.x | YES | YES | YES | YES | YES | YES | NO | NO |
123137
| 5.4.x | YES | YES | YES | YES | YES | YES | NO | NO |
124138
| 5.3.x | YES | YES | YES | YES | YES | YES | NO | NO |
125139
| 5.2.x | YES | YES | YES | YES | YES | YES | NO | NO |
@@ -132,6 +146,8 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
132146

133147
| Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10| Scala 2.11 | Scala 2.12 |
134148
|-----------|------------|------------|------------|------------|------------|------------|------------|
149+
| 5.5.x | NO | YES | YES | YES | YES | NO | YES |
150+
| 5.4.x | NO | YES | YES | YES | YES | NO | YES |
135151
| 5.3.x | NO | YES | YES | YES | YES | NO | YES |
136152
| 5.2.x | NO | YES | YES | YES | YES | NO | YES |
137153
| 5.1.x | NO | YES | YES | YES | YES | NO | YES |
@@ -141,38 +157,45 @@ Find out more about 4.x `SparkNLP` versions in our official [documentation](http
141157

142158
### Databricks Support
143159

144-
Spark NLP 5.5.0-rc1 has been tested and is compatible with the following runtimes:
160+
Spark NLP 5.5.0 has been tested and is compatible with the following runtimes:
145161

146162
| **CPU** | **GPU** |
147163
|--------------------|--------------------|
148-
| 14.0 / 14.0 ML | 14.0 ML & GPU |
149164
| 14.1 / 14.1 ML | 14.1 ML & GPU |
150165
| 14.2 / 14.2 ML | 14.2 ML & GPU |
151166
| 14.3 / 14.3 ML | 14.3 ML & GPU |
167+
| 15.0 / 15.0 ML | 15.0 ML & GPU |
168+
| 15.1 / 15.0 ML | 15.1 ML & GPU |
169+
| 15.2 / 15.0 ML | 15.2 ML & GPU |
170+
| 15.3 / 15.0 ML | 15.3 ML & GPU |
171+
| 15.4 / 15.0 ML | 15.4 ML & GPU |
152172

153173
We are compatible with older runtimes. For a full list check databricks support in our official [documentation](https://sparknlp.org/docs/en/install#databricks-support)
154174

155175
### EMR Support
156176

157-
Spark NLP 5.5.0-rc1 has been tested and is compatible with the following EMR releases:
177+
Spark NLP 5.5.0 has been tested and is compatible with the following EMR releases:
158178

159179
| **EMR Release** |
160180
|--------------------|
161181
| emr-6.13.0 |
162182
| emr-6.14.0 |
163183
| emr-6.15.0 |
164184
| emr-7.0.0 |
185+
| emr-7.1.0 |
186+
| emr-7.2.0 |
165187

166188
We are compatible with older EMR releases. For a full list check EMR support in our official [documentation](https://sparknlp.org/docs/en/install#emr-support)
167189

168190
Full list of [Amazon EMR 6.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-6x.html)
169-
Full list 5.5.0-rc1mazon EMR 7.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-7x.html)
191+
Full list of [Amazon EMR 7.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-7x.html)
170192

171193
NOTE: The EMR 6.1.0 and 6.1.1 are not supported.
172194

173195
## Installation
174196

175197
### Command line (requires internet connection)
198+
176199
To install spark-nlp packages through command line follow [these instructions](https://sparknlp.org/docs/en/install#command-line) from our official documentation
177200

178201
### Scala
@@ -182,18 +205,19 @@ deployed to Maven central. To add any of our packages as a dependency in your ap
182205
from our official documentation.
183206

184207
If you are interested, there is a simple SBT project for Spark NLP to guide you on how to use it in your
185-
projects [Spark NLP SBT S5.5.0-rc1r](https://github.com/maziyarpanahi/spark-nlp-starter)
208+
projects [Spark NLP SBT S5.5.0r](https://github.com/maziyarpanahi/spark-nlp-starter)
186209

187210
### Python
188211

189212
Spark NLP supports Python 3.7.x and above depending on your major PySpark version.
190213
Check all available installations for Python in our official [documentation](https://sparknlp.org/docs/en/install#python)
191214

192-
193215
### Compiled JARs
216+
194217
To compile the jars from source follow [these instructions](https://sparknlp.org/docs/en/compiled#jars) from our official documenation
195218

196219
## Platform-Specific Instructions
220+
197221
For detailed instructions on how to use Spark NLP on supported platforms, please refer to our official documentation:
198222

199223
| Platform | Supported Language(s) |
@@ -206,7 +230,6 @@ For detailed instructions on how to use Spark NLP on supported platforms, please
206230
| [EMR Cluster](https://sparknlp.org/docs/en/install#emr-cluster) | Scala, Python |
207231
| [GCP Dataproc Cluster](https://sparknlp.org/docs/en/install#gcp-dataproc) | Scala, Python |
208232

209-
210233
### Offline
211234

212235
Spark NLP library and all the pre-trained models/pipelines can be used entirely offline with no access to the Internet.
@@ -227,7 +250,7 @@ In Spark NLP we can define S3 locations to:
227250

228251
Please check [these instructions](https://sparknlp.org/docs/en/install#s3-integration) from our official documentation.
229252

230-
## Document5.5.0-rc1
253+
## Document5.5.0
231254

232255
### Examples
233256

@@ -260,7 +283,7 @@ the Spark NLP library:
260283
keywords = {Spark, Natural language processing, Deep learning, Tensorflow, Cluster},
261284
abstract = {Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise.}
262285
}
263-
}5.5.0-rc1
286+
}5.5.0
264287
```
265288

266289
## Community support

build.sbt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64)
66

77
organization := "com.johnsnowlabs.nlp"
88

9-
version := "5.5.0-rc1"
9+
version := "5.5.0"
1010

1111
(ThisBuild / scalaVersion) := scalaVer
1212

docs/_layouts/landing.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ <h3 class="grey h3_title">{{ _section.title }}</h3>
201201
<div class="highlight-box">
202202
{% highlight bash %}
203203
# Using PyPI
204-
$ pip install spark-nlp==5.5.0-rc1
204+
$ pip install spark-nlp==5.5.0
205205

206206
# Using Anaconda/Conda
207207
$ conda install -c johnsnowlabs spark-nlp

docs/en/advanced_settings.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ spark = SparkSession.builder
5252
.config("spark.kryoserializer.buffer.max", "2000m")
5353
.config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained")
5454
.config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage")
55-
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.0-rc1")
55+
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.0")
5656
.getOrCreate()
5757
```
5858

@@ -66,7 +66,7 @@ spark-shell \
6666
--conf spark.kryoserializer.buffer.max=2000M \
6767
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
6868
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
69-
--packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.0-rc1
69+
--packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.0
7070
```
7171

7272
**pyspark:**
@@ -79,7 +79,7 @@ pyspark \
7979
--conf spark.kryoserializer.buffer.max=2000M \
8080
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
8181
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
82-
--packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.0-rc1
82+
--packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.0
8383
```
8484

8585
**Databricks:**

docs/en/concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ $ java -version
6666
$ conda create -n sparknlp python=3.7 -y
6767
$ conda activate sparknlp
6868
# spark-nlp by default is based on pyspark 3.x
69-
$ pip install spark-nlp==5.5.0-rc1 pyspark==3.3.1 jupyter
69+
$ pip install spark-nlp==5.5.0 pyspark==3.3.1 jupyter
7070
$ jupyter notebook
7171
```
7272

docs/en/examples.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ $ java -version
1818
# should be Java 8 (Oracle or OpenJDK)
1919
$ conda create -n sparknlp python=3.7 -y
2020
$ conda activate sparknlp
21-
$ pip install spark-nlp==5.5.0-rc1 pyspark==3.3.1
21+
$ pip install spark-nlp==5.5.0 pyspark==3.3.1
2222
```
2323

2424
</div><div class="h3-box" markdown="1">
@@ -40,7 +40,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
4040
# -p is for pyspark
4141
# -s is for spark-nlp
4242
# by default they are set to the latest
43-
!bash colab.sh -p 3.2.3 -s 5.5.0-rc1
43+
!bash colab.sh -p 3.2.3 -s 5.5.0
4444
```
4545

4646
[Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb) is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP pretrained pipelines.

docs/en/hardware_acceleration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Since the new Transformer models such as BERT for Word and Sentence embeddings a
5050
| DeBERTa Large | +477%(5.8x) |
5151
| Longformer Base | +52%(1.5x) |
5252

53-
Spark NLP 5.5.0-rc1 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
53+
Spark NLP 5.5.0 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
5454

5555
- NVIDIA® GPU drivers version 450.80.02 or higher
5656
- CUDA® Toolkit 11.2

0 commit comments

Comments
 (0)