March 29, 2025
Neural edit-tree lemmatization for spaCy ยท Explosion
We are happy to introduce a new, experimental, machine learning-based lemmatizer that posts accuracies above 95% for many languages. This lemmatizer learns to predict lemmatization rules from a corpus of examples and removes the need to write an exhaustive set of per-language lemmatization rules. spaCy provides a Lemmatizer component for assigning base forms (lemmas) to tokens. For example, it lemmatizes the sentence The kids bought treats from various stores. to its base forms: the kid buy treat from various store. Lemmas are useful in many applications. For example, a search engine could use lemmas to match all inflections of a