March 29, 2025

ikayaniaamirshahzad@gmail.com

Introducing spaCy v3.3 · Explosion

We’re pleased to present v3.3 of the spaCy Natural Language
Processing library. spaCy v3.3 improves the speed of nearly all statistical
pipeline components, adds a trainable lemmatizer and includes new trained
pipelines for Finnish, Korean and Swedish.

spaCy v3.3 includes a slew of speed improvements that increase the speed of all
core pipeline components in training and inference. For longer texts, the
trained pipeline speeds improve 15% or more in prediction. Detailed
benchmarks for en_core_web_md
show the speed improvements for spaCy v3.2 vs v3.3:

Speed Benchmarks: en_core_web_md

CPU	Avg. Words/Doc	v3.2 Words/Sec	v3.3 Words/Sec	Diff
Intel Xeon W-2265	100	17292	17441	0.86%
	1000	15408	16024	4.00%
	10000	12798	15346	19.91%
Apple M1	100	18272	18408	0.74%
	1000	18794	19248	2.42%
	10000	15144	17513	15.64%

The new trainable lemmatizer
component uses edit trees to
transform tokens into lemmas. Try out the trainable lemmatizer with the
training quickstart!

displaCy now supports
overlapping span annotation from
Doc.spans:

displaCy for overlapping spans

v3.3 introduces new CPU/CNN pipelines for Finnish, Korean and Swedish, which use
the new trainable lemmatizer and
floret vectors. Due to the use of
Bloom embeddings and subwords, the
pipelines have compact vectors with no out-of-vocabulary words.

The trained pipelines for the following languages switch from lookup or
rule-based lemmatizers to the new trainable lemmatizer:

Lemmatizer Accuracy (md Pipeline)

Many cool new plugins, extensions, pipelines and tutorials have been added to
the spaCy universe since v3.2:

View the spaCy universe

Resources

Source link

Introducing spaCy v3.3 · Explosion

Speed Benchmarks: en_core_web_md

Lemmatizer Accuracy (md Pipeline)

Resources

Latest articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency

Leave a Comment Cancel reply

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency

Introducing spaCy v3.3 · Explosion

Speed Benchmarks: en_core_web_md

Lemmatizer Accuracy (md Pipeline)

Resources

Latest articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency

Leave a Comment Cancel reply

Featured articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency