Articles for category: AI Tools

Neural edit-tree lemmatization for spaCy ยท Explosion

We are happy to introduce a new, experimental, machine learning-based lemmatizer that posts accuracies above 95% for many languages. This lemmatizer learns to predict lemmatization rules from a corpus of examples and removes the need to write an exhaustive set of per-language lemmatization rules. spaCy provides a Lemmatizer component for assigning base forms (lemmas) to tokens. For example, it lemmatizes the sentence The kids bought treats from various stores. to its base forms: the kid buy treat from various store. Lemmas are useful in many applications. For example, a search engine could use lemmas to match all inflections of a

From OpenAI to Open LLMs with Messages API on Hugging Face

We are excited to introduce the Messages API to provide OpenAI compatibility with Text Generation Inference (TGI) and Inference Endpoints. Starting with version 1.4.0, TGI offers an API compatible with the OpenAI Chat Completion API. The new Messages API allows customers and users to transition seamlessly from OpenAI models to open LLMs. The API can be directly used with OpenAI’s client libraries or third-party tools, like LangChain or LlamaIndex. “The new Messages API with OpenAI compatibility makes it easy for Ryght’s real-time GenAI orchestration platform to switch LLM use cases from OpenAI to open models. Our migration from GPT4 to

TypeScript Essentials: Crafting Simple Types

About TypeScript TypeScript is a highly expressive language that allows you to write algorithms using types alone. However, it is impossible to express every type. For example, in TypeScript, the maximum length of a tuple that can be represented by types is 999. Depending on the version of TypeScript, some restrictions may be relaxed or new features may be added. In this process, types that were previously impossible to express might become possible, but there are still types that cannot be represented. In most cases, this is not a matter of the language’s expressiveness but rather a restriction to prevent

Universal Dependencies v2.5 Benchmarks for spaCy ยท Explosion

To demonstrate the performance of spaCy v3.2, we present a series of UD benchmarks comparable to the Stanza and Trankit evaluations on Universal Dependencies v2.5, using the evaluation from the CoNLL 2018 Shared Task. The benchmarks show the competitive performance of spaCyโ€™s core components for tagging, parsing and sentence segmentation and also let us highlight and evaluate the new edit tree lemmatizer. The trained pipelines in the benchmarks are made available for download on Explosionโ€™s Hugging Face Hub repo and a UD benchmark project lets you run the full training and evaluation for any Universal Dependencies corpus. The core syntactic

AMD Pervasive AI Developer Contest!

AMD and Hugging Face are actively engaged in helping developers seamlessly deploy cutting-edge AI models on AMD hardware. This year, AMD takes their commitment one step further by providing developers free, hands-on access to state-of-the-art AMD hardware through their recently announced Pervasive AI Developer Contest. This global competition is an incubator of AI innovation, beckoning developers worldwide to create unique AI applications. Developers can choose from three exciting categories: Generative AI, Robotics AI, and PC AI, each of them entitled to cash prices up to $10,000 USD for winners, with a total of $160,000 USD being given away. 700 AMD

an end-to-end spaCy pipeline for exploring health supplement effects ยท Explosion

Create better access to health with machine learning and natural language processing. Read about the journey of developing Healthsea, an end-to-end spaCy pipeline for analyzing user reviews to supplementary products and extracting their potential effects on health. Iโ€™m a machine learning engineer at Explosion, and together with our fantastic team, weโ€™ve been working on Healthsea to further expand the spaCy universe ๐Ÿช. In this blog post, Iโ€™ll take you on the journey of training different NLP models, creating custom components and assembling them into a spaCy v3 pipeline! Table of contents Feel free to jump straight to the section that

save money, time and carbon with open source

Should you fine-tune your own model or use an LLM API? Creating your own model puts you in full control but requires expertise in data collection, training, and deployment. LLM APIs are much easier to use but force you to send your data to a third party and create costly dependencies on LLM providers. This blog post shows how you can combine the convenience of LLMs with the control and efficiency of customized models. In a case study on identifying investor sentiment in the news, we show how to use an open-source LLM to create synthetic data to train your

Our Year in Review ยท Explosion

The year 2021 is coming to an end, and like the previous year, it was shaped by unique challenges that impacted our work together. For Explosion, it was a very productive year. We found an investor that fits our strategy, we released spaCy v3, the work on Prodigy Teams is in full swing, and the team has grown a lot. So hereโ€™s our look back at our highlights of the year 2021. ๐Ÿ’ซ Feb 1: We kicked off February with the big release of spaCy v3.0, which features new transformer-based pipelines that get spaCyโ€™s accuracy right up to the current