March 29, 2025

ikayaniaamirshahzad@gmail.com

Introducing spaCy v3.0 · Explosion

spaCy v3.0 is a huge release! It features new
transformer-based pipelines that get spaCy’s accuracy right up to the current
state-of-the-art, and a new workflow system to help you take projects from
prototype to production. It’s much easier to configure and train your pipeline,
and there are lots of new and improved integrations with the rest of the NLP
ecosystem.

We’ve been working on spaCy v3.0 for over a year now, and
almost two years if you count all the work that’s gone into
Thinc. Our main aim with the release is to make it easier to
bring your own models into spaCy, especially state-of-the-art models like
transformers. You can write models powering spaCy components in frameworks like
PyTorch or TensorFlow, using our awesome new configuration system to describe
all of your settings. And since modern NLP workflows often consist of multiple
steps, there’s a new workflow system to help you keep your work organized.

For detailed installation instructions for your platform and setup, check out
the installation quickstart widget.

pip install -U spacy

spaCy v3.0 features all new transformer-based pipelines that bring spaCy’s
accuracy right up to the current state-of-the-art. You can use any
pretrained transformer to train your own pipelines, and even share one
transformer between multiple components with multi-task learning. spaCy’s
transformer support interoperates with PyTorch and the
HuggingFace transformers library,
giving you access to thousands of pretrained models for your pipelines. See
below for an overview of the new pipelines.

Accuracy on the OntoNotes 5.0 corpus
(reported on the development set).

Named Entity Recognition System	OntoNotes	CoNLL ‘03
spaCy RoBERTa (2020)	89.7	91.6
Stanza (StanfordNLP)¹	88.8	92.1
Flair²	89.7	93.1

Named entity recognition accuracy on the
OntoNotes 5.0 and
CoNLL-2003 corpora. See
NLP-progress for
more results. Project template:
benchmarks/ner_conll03.
1. Qi et al. (2020). 2.
Akbik et al. (2018).

spaCy lets you share a single transformer or other token-to-vector (“tok2vec”)
embedding layer between multiple components. You can even update the shared
layer, performing multi-task learning. Reusing the embedding layer between
components can make your pipeline run a lot faster and result in much smaller
models.

You can share a single transformer or other token-to-vector model between
multiple components by adding a Transformer or Tok2Vec component near the
start of your pipeline. Components later in the pipeline can “connect” to it by
including a listener layer within their model.

spaCy v3.0 provides retrained model families for 18
languages and 59 trained pipelines in total, including 5 new
transformer-based pipelines. You can also train your own transformer-based
pipelines using your own data and transformer weights of your choice.

Source link

Introducing spaCy v3.0 · Explosion

config.cfg

Wrapping a PyTorch model

Using spaCy projects

Selected example templates

Track your results with Weights & Biases

config.cfg

Parallel training with Ray

Argument validation with type hints

Resources

Latest articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency

Leave a Comment Cancel reply

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency

Introducing spaCy v3.0 · Explosion

config.cfg

Wrapping a PyTorch model

Using spaCy projects

Selected example templates

Track your results with Weights & Biases

config.cfg

Parallel training with Ray

Argument validation with type hints

Resources

Latest articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency

Leave a Comment Cancel reply

Featured articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency