The year 2021 is coming to an end, and like the previous year, it was shaped by
unique challenges that impacted our work together. For Explosion, it was a very
productive year. We found an investor that fits our strategy, we released spaCy
v3, the work on Prodigy Teams is in full swing, and the team has grown
a lot. So here’s our look back at our highlights of the year 2021.
- 💫 Feb 1: We kicked off February with the big release of
spaCy v3.0, which features new
transformer-based pipelines that get spaCy’s accuracy right up to the current
state-of-the-art, and a new workflow system to help you take projects from
prototype to production. If you’re interested in what spaCy v3 is all about,
check out our video, where Ines
and Matt guide you through some of the most exciting new features! - 🪐 Feb 1: As an add-on to our spaCy v3 release, we published
spaCy projects, which allow managing
end-to-end spaCy workflows for different use cases and domains. - 📺 Feb 1: Along with spaCy v3, Ines published the
behind the scene spaCy v3 design concepts
video. - 📺 Feb 1: Sofie celebrated the release of spaCy v3 with her tutorial on
implementing a
trainable entity relation extraction component in spaCy v3. - 📺 Feb 3: At the
Contributing.Today meetup
with Guido van Rossum, Sofie presented the new
spaCy v3 features.
- 💫 Mar 4: We released 1.0 of our
spaCy and Stanza package, which
allows you to use the latest Stanza (StanfordNLP) research models directly in
spaCy. - 📺 Mar 17: March saw
a new episode of Vincent
Warmerdam’s “Intro to NLP with spaCy” series. In this episode, Vincent
showcased the project system of spaCy v3. - 📺 Mar 29: Ines joined the at the German Python Podcast to talk about
Natural Language Processing with spaCy. - 🥳 Mar 30: At the end of March, we celebrated that spaCy reached
20k+ stars on GitHub.
- 📺 Apr 22: Ines was invited as a guest on Microsoft’s
A bit of AI show to talk about
her journey into AI. - 📺 Apr 30: Later that month, Ines joined the
Snorkel Science Talks.
She discussed her path into machine learning, fundamental design decisions
behind spaCy, and the importance of bringing together different stakeholders
in the machine learning development process.
- 💫 Jul 7: We released spaCy v3.1,
which allows using predicted annotations during training. In addition, the
release includes aSpanCategorizer
component for predicting arbitrary and
overlapping spans. You can create training data for it using Prodigy’s new
annotation UI for overlapping spans. - 🤗 Jul 13:
Hugging Face welcomed spaCy to their Hub.
You can now upload any spaCy pipeline using the
spacy-huggingface-hub
CLI, with auto-generated pretty READMEs and a interactive visualizers to try
your pipeline in the browser. - 🥳 Jul 14: Sofie became
team lead for
spaCy.
- ⚙ Aug 12: We’ve partnered with Weights & Biases and the tracking of
reproducible spaCy NLP pipelines
became even easier. - ✨ Aug 12: We released
Prodigy v1.11, which includes a
bunch of new features, including a new installation process via pip and new
wheels for Python 3.9 and ARM architectures, a new recipe and UI for
annotating overlapping and nested spans,
new recipes for improving a sentence recognizer model, further training and
data export recipes that seamlessly integrate with spaCy’s config system. - 📺 Aug 17: Ines was live on the radio and joined the
Byte Into IT show on Melbourne’s Triple R radio
station.
- 💥 Sep 2: A big moment for us – we sold
5% of Explosion.
Since founding Explosion in 2016, we’ve run the company as a profitable
business. Our next step is Prodigy Teams, and doing this project well is much
more important to us than doing it cheaply, so we decided to consider an
external investment. With SignalFire, we found an
investor that fits our strategy.
- 💫 Nov 5: We released spaCy v3.2,
which improved performance for spaCy on Apple M1 and Nvidia GPU, addedDoc
input for pipelines, and provided registered scoring functions. - 🍏 Nov 5: Along with the new spaCy 3.2, we published our
thinc-apple-ops
package to
accelerate spaCy on macOS by calling into Apple’s native “Accelerate” library. - 🌸 Nov 5: We also presented Adriane’s recent work on our new
floret
library, which uses fastText
and Bloom embeddings for compact, full-coverage vectors with spaCy. - 🌳 Nov 17: In mid-November, Daniël presented our new experimental machine
learning-based lemmatizer
that posts accuracies above 95% for many languages. - ✍️ Nov 17: Our machine learning engineer Lj Miranda published a
detailed technical overview
on using spaCy’s project config system, traversing our stack in increasing
levels of abstraction. - 🛡️ Nov 17: The Guardian
wrote about
how their data science team used spaCy and Prodigy to
train a machine learning model that helps extract quotes from news articles
and match them to the correct source. - 📺 Nov 30 Weights & Bias hosted an
AMA with Ines, where she talked
about software development, Python, startups and product building.
- ✍️ Dec 8: For
KDNuggets,
Ines shared her perspective on AI and Machine Learning developments in 2021
and key trends for 2022. - 🏫 Dec 9: We’ve updated our interactive
NLP course for spaCy v3! The updated course is
available in English, Spanish, German, and Japanese. More languages will
follow. - ✍️ Dec 14: To demonstrate the performance of spaCy v3.2, Adriane compiled
a series of UD benchmarks
comparable to the Stanza and Trankit evaluations on Universal Dependencies
v2.5. - 🦑 Dec 15: Our machine learning engineer
Edward published his
blog post on Healthsea, an end-to-end
spaCy pipeline to
analyze user reviews to supplementary products and
extract their potential effects on health. - 📺 Dec 17: Ines was invited as a guest on the TalkPython podcast to
discuss
machine learning ethics and EU laws.
With the community and the team continuing to grow, we look forward to making 2022 even better. Thanks for all your support!