March 28, 2025

ikayaniaamirshahzad@gmail.com

Introducing spaCy v3.4 · Explosion


We’re pleased to publish v3.4 of the spaCy Natural Language
Processing library. spaCy v3.4 brings typing and speed improvements along with
new vectors for English pipelines and new trained pipelines for Croatian. This
release also includes prebuilt linux aarch64 wheels for all spaCy dependencies
distributed by Explosion.

Typing improvements

spaCy v3.4 supports pydantic v1.9 and mypy 0.950+ through extensive updates to
types in Thinc v8.1.

Speed improvements

  • For the parser, use C saxpy/sgemm provided by the Ops implementation in
    order to use Accelerate through thinc-apple-ops.
  • Improved speed of vector lookups.
  • Improved speed for Example.get_aligned_parse and Example.get_aligned.

New trained pipelines

v3.4 introduces new CPU/CNN pipelines for Croatian, which use the trainable
lemmatizer and floret vectors. Due to the
use of Bloom embeddings and
subwords, the pipelines have compact vectors with no out-of-vocabulary words.

Pipeline updates

All CNN pipelines have been extended with whitespace augmentation.

The English CNN pipelines have new word vectors, which improve the NER
performance and update the vectors with words like “AirTags”, “Brexit”, “covid”
and “doomscrolling”:

Many cool new plugins, extensions, pipelines and tutorials have been added to
the spaCy universe since v3.3:

Aim-spacy An Aim-based spaCy experiment tracker.
Asent Fast, flexible and transparent sentiment analysis.
spaCy fishing Named entity disambiguation and linking on Wikidata in spaCy with Entity-Fishing.
spacy-report Generates interactive reports for spaCy models.

View the spaCy universe

Resources



Source link

Leave a Comment