While 2020 hasn’t been easy for anyone, at Explosion we’ve considered ourselves
relatively fortunate in this most
interesting
year. We’ve always worked remotely, so we’ve been able to take both pride and
comfort in continuing to ship good software. Here’s a look back at what we’ve
been up to.
- 🔮 Jan 28: 2020 started with a big release: the alpha of
Thinc v8.0, a lightweight deep learning library that
offers an elegant, type-checked, functional-programming API for composing
models, with support for layers defined in other frameworks such as PyTorch,
TensorFlow or MXNet. Thinc was re-written from the ground up to support some
of the new workflows coming to spaCy v3.0, including
a flexible training configuration system and the ability to plug in model
implementations written in any framework.
- 🎤 Feb 8: In February, Matt and Ines were invited to PyCon Colombia in
Medellín – thanks to the team for organizing such an awesome event! Ines
presented a keynote titled
“The Future of NLP in Python”
about how new Python tooling and advancements in Natural Language Processing
help with closing the gap between prototype and production, making it easier
to ship powerful natural language understanding pipelines. - 📺 Feb 8: At PyCon Colombia, Ines was also
interviewed by Karolina Ladino
and they talked the history of spaCy, and how to get into programming, machine
learning and NLP.
- 📺 Mar 2: March started with a
new episode of
Vincent Warmerdam’s popular video series,
“Intro to NLP with spaCy”. In this episode, he explored the processing
pipeline and trained a simple NER model to detect programming languages. - 📺 Mar 16: Ines published an end-to-end
video tutorial showing how to
use our annotation tool Prodigy to train a named entity
recognition model from scratch, by taking advantage of semi-automatic
annotation and modern transfer learning techniques. - 💻 Mar 20: Sebastián released Typer,
a library for building modern CLIs, powered by Python type hints. We’ve been
using it extensively in our projects ever since! - 📺 Mar 24: In the next Prodigy
tutorial video, Ines showed how
to build fully custom annotation workflows and UIs for image captioning, and
how to plug in a simple PyTorch image captioning model. Also: cats! 😺 - 📻 Mar 30: Towards the end of the month, Matt joined the
Podcast.__init__
podcast again and discussed Explosion’s developer tools stack and what’s next
for spaCy, Thinc and Prodigy.
- 🏫 Apr 21: In April, we released the first translation of the free spaCy
online course, Modernes NLP mit spaCy,
featuring German instructions and text examples. - 📻 Apr 26: Ines was also invited as a guest on the
Chai Time Data Science podcast
and talked about her NLP journey, spaCy and Prodigy, open-source development,
and tattoos.
- 🏫 May 6: May started off with a Japanese translation of the free spaCy
online course:
spaCy を使った先進的な自然言語処理. Special
thanks to Yohei Tamura! - 📺 May 7: A day later, Sofie released an end-to-end video tutorial showing
how to train your own
custom Entity Linking model
with spaCy to disambiguate different mentions of a person name to unique
identifiers in a knowledge base, and how to create your own training data from
scratch. - 🏫 May 11: ¡Hola! The free spaCy online course was released in Spanish,
complete with Spanish text examples:
NLP avanzado con spaCy. Thanks to
Camila Gutierrez! - 📺 May 14: May featured even more additions to the free spaCy course:
Ines recorded video versions in
English and
German that you can view as
standalone lessons on YouTube, or watch as part of the interactive online
course.
- 📺 Jun 13: June saw
another new episode of Vincent
Warmerdam’s “Intro to NLP with spaCy” series. In this episode, he digs deeper
into the performance of the NER model he trained, using a rule-based
classifier to probe for errors and improve the training data. - 💫 Jun 16: We also released spaCy v2.3, which added
trained pipelines for Chinese, Japanese, Danish, Polish and Romanian, updated
all 15 model families with word vectors and improved accuracy, while also
decreasing model size and loading times for models with vectors. - ✨ Jun 16: Prodigy got a big upgrade in June with the
release of v1.10.0. The version
includes a bunch of new features, interfaces and recipes for dependency and
relation annotation, audio and video annotation, as well as a new and improved
manual image annotation interface with support for editing shapes and bounding
boxes. - 📺 June 16: To show you the new Prodigy features in action, Ines recorded
a video walkthrough that
includes examples of dependency and relation annotation, coreference
resolution, biomedical event extraction, audio and video annotation, NER
annotation for fine-tuning transformers and more! - 🎤 Jun 18: At Rasa’s Level 3 AI Assistant conference, Ines talked about
“Designing Practical NLP Solutions”,
how to break down larger business problems into solvable machine learning
tasks, and how to make your NLP projects fail less. - 💻 Jun 21:
spacy-streamlit
is released!
It’s a Python library containing building blocks and visualizers for
integrating spaCy pipelines into Streamlit apps. - 📺 Jun 25: Finally, we published a
Spanish video version of the
free online course, presented by
Camila Gutierrez. ¡Practiquemos!
- 📻 Oct 4: Sebastián was a guest on the
Talk Python podcast
to discuss building modern and fast APIs with FastAPI. - 📻 Oct 13: On the
DevJourney Podcast,
Ines shared her personal software development journey, from getting her first
computer to becoming a core developer of spaCy and founding Explosion. - 💫 Oct 15: In mid-October, we finally published the long awaited
nightly pre-release of spaCy v3.0! spaCy v3.0
features all new transformer-based pipelines that bring spaCy’s accuracy right
up to the current state-of-the-art. You can use any pretrained transformer to
train your own pipelines, and even share one transformer between multiple
components with multi-task learning. Training is now fully configurable and
extensible, and you can define your own custom models using PyTorch,
TensorFlow and other frameworks. The new spaCy projects system lets you
describe whole end-to-end workflows in a single file, giving you an easy path
from prototype to production, and making it easy to clone and adapt
best-practice projects for your own use cases. - 🎤 Oct 26: In her
keynote at Global AI Live,
Ines presented the upcoming spaCy v3.0 and how it makes it easier than ever to
bring state-of-the-art NLP projects from prototype to production. - 🐍 Oct 27: Ines was honored to be recognized as a
Python Software Foundation Fellow,
due to her work with Explosion on spaCy and other projects. - 📻 Oct 29: Wrapping up October, Ines and Sofie joined the
Gradient Dissent podcast hosted
by Weights & Biases to talk about spaCy v3.0 and the new features, the
motivation behind the new release and the various design decisions we made
along the way.
- 📰 Dec 4: For
KDNuggets,
Ines shared her perspective on AI and Machine Learning developments in 2020
and key trends for 2021. - 💫 Dec 11: In December, GitHub introduced discussion boards, so we
officially launched the
spaCy discussion board! Come
join the community and ask for help with your code, share tips, tricks and
best practices, discuss features and project ideas, collaborate on language
support, show off what you’ve built and stay up to date with the latest spaCy
news! - 💘 Dec 14: To celebrate another year (and Ines’ birthday!), we started
another round of sending
spaCy stickers to
the community! This time with new designs, including cool holographic styles.
You can still
sign up here
to receive yours! - 📻 Dec 28: Wrapping up 2020, Ines joined the
Python Year in Review episode
of Talk Python to talk about what the year had in store for 2020, and what to
expect for 2021.
With the community and the team continuing to grow, we look forward to making 2021 even better. Thanks for all your support!