We’re excited to release v3.5 of the spaCy Natural Language
Processing library. spaCy v3.5 introduces three new CLI commands, adds fuzzy
matching, provides improvements to our entity linking functionality, and
includes a range of language updates and bug fixes.
New CLI commands
apply
applies a pipeline to one or more
.txt
,.jsonl
or.spacy
filesbenchmark speed
profiles a pipeline’s
speed with a warmup and a confidence intervalfind-threshold
tests a range of
threshold values forspancat
,textcat_multilabel
, etc, to identify the
most optimal one.
Examples on how to run these commands can be found in our
CLI documentation as well as in our
v3.5 usage notes.
Fuzzy matching
The new FUZZY
operator allows
fuzzy matches based on
Levenshtein edit distance:
pattern = [{"LOWER": {"FUZZY": "definitely"}}]
The FUZZY
and REGEX
operators are now also supported for lists with IN
and
NOT_IN
:
pattern = [{"TEXT": {"REGEX": {"NOT_IN": ["^awe(some)?$", "^wonder(ful)?"]}}}]
Entity linking
The entity linker’s knowledge base has been refactored for easier customization.
KnowledgeBase
is now an abstract class and the
default implementation is the new class
InMemoryLookupKB
.
Read more about all the improvements, updates and bug fixes:
Many cool new plugins, extensions, pipelines and tutorials have been added to
the spaCy universe and
spaCy projects since v3.4:
Additionally, the spaCy team has added demo projects for two newer components: