wjbmattingly/spacyex: SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.

spaCyEx is a powerful extension for spaCy, designed to make pattern matching as flexible and easy as using regular expressions. It builds upon the existing capabilities of spaCy’s Matcher, enhancing it with a more accessible syntax for defining complex patterns. spaCyEx allows for intuitive and detailed text pattern specifications, perfect for extracting detailed linguistic features from texts.

You can install spaCyEx via pip:

Dynamic Pattern Creation: Create complex token matching patterns using a simple string-based syntax.
Integration with spaCy: Leverage spaCy’s Matcher capabilities to find sequences in text that match defined patterns.
Customizable Matching Rules: Define token attributes including text characteristics, lexical attributes, and grammatical properties.

Define patterns using a string syntax where each token and its attributes are encapsulated by parentheses. Token attributes are specified by key-value pairs, separated by an equals sign (=), and multiple attributes are divided by a pipe (|).

Single Attribute: (pos=NOUN)
Multiple Attributes: (pos=NOUN|lemma=run)
Using List Values: (lemma=in[run,walk])
Using Operators: (ent_type=person|op={2,3})

Once a pattern is defined, it can be used to search text for matches.

Here is a simple example to get started with spaCyEx:

import spacyex as se
import spacy

nlp = spacy.load("en_core_web_sm")
text = "John Smith runs fast, but Jacob Smith walks slowly."
pattern = "(ent_type=person|op={2}) (lemma=in[run,walk]) (pos=ADV)"

results = se.search(pattern, text, nlp)
for match in results:
    print(match[0].text, "Start:", match[1], "End:", match[2])

This code will match sequences in the text based on the defined pattern, using named entities, lemmas, and parts of speech.

Source link

wjbmattingly/spacyex: SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.

Latest articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency

Leave a Comment Cancel reply

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency

wjbmattingly/spacyex: SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.

Latest articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency

Leave a Comment Cancel reply

Featured articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency