Articles for category: AI Tools

Enabling communities to collectively build better datasets together using Argilla and Hugging Face Spaces

Recently, Argilla and Hugging Face launched Data is Better Together, an experiment to collectively build a preference dataset of prompt rankings. In a few days, we had: 350 community contributors labeling data Over 11,000 prompt ratings See the progress dashboard for the latest stats! This resulted in the release of 10k_prompts_ranked, a dataset consisting of 10,000 prompts with user ratings for the quality of the prompt. We want to enable many more projects like this! In this post, we’ll discuss why we think it’s essential for the community to collaborate on building datasets and share an invitation to join the

The Principle Analysis of Frontend Monitoring SDK

A complete frontend monitoring platform consists of three parts: data collection and reporting, data processing and storage, and data visualization. This article focuses on the first component – data collection and reporting. Below is an outline of the topics we’ll cover: Since theoretical knowledge alone can be difficult to grasp, I’ve created a simple monitoring SDK that implements these technical concepts. You can use it to create simple demos and gain a better understanding. Reading this article while experimenting with the SDK will provide the best learning experience. Collect Performance Data The Chrome developer team has proposed a series of

Introducing spaCy v3.3 · Explosion

We’re pleased to present v3.3 of the spaCy Natural Language Processing library. spaCy v3.3 improves the speed of nearly all statistical pipeline components, adds a trainable lemmatizer and includes new trained pipelines for Finnish, Korean and Swedish. spaCy v3.3 includes a slew of speed improvements that increase the speed of all core pipeline components in training and inference. For longer texts, the trained pipeline speeds improve 15% or more in prediction. Detailed benchmarks for en_core_web_md show the speed improvements for spaCy v3.2 vs v3.3: Speed Benchmarks: en_core_web_md CPU Avg. Words/Doc v3.2 Words/Sec v3.3 Words/Sec Diff Intel Xeon W-2265 100 17292

How well can your Multimodal model jointly reason over text and image in text-rich scenes?

Models are becoming quite good at understanding text on its own, but what about text in images, which gives important contextual information? For example, navigating a map, or understanding a meme? The ability to reason about the interactions between the text and visual context in images can power many real-world applications, such as AI assistants, or tools to assist the visually impaired. We refer to these tasks as “context-sensitive text-rich visual reasoning tasks”. At the moment, most evaluations of instruction-tuned large multimodal models (LMMs) focus on testing how well models can respond to human instructions posed as questions or imperative

February 2025 Top 40 New CRAN Packages

Artificial Intelligence chores v0.1.0: Provides a collection of ergonomic large language model assistants designed to help you complete repetitive, hard-to-automate tasks quickly. After selecting some code, press the keyboard shortcut you’ve chosen to trigger the package app, select an assistant, and watch your chore be carried out. Users can create custom helpers just by writing some instructions in a markdown file. There are three vignettes: Getting started, Custom helpers, and Gallery. gander v0.1.0: Provides a Copilot completion experience that knows how to talk to the objects in your R environment. ellmer chats are integrated directly into your RStudio and Positron

Developing Secure AI Models with Differential Privacy

Introduction: Artificial Intelligence (AI) has become an integral part of our daily lives, from chatbots to self-driving cars, it is continuously evolving and improving our efficiency. With the rapid growth of AI, comes growing concerns for data privacy and security. To address these concerns, Differential Privacy (DP) has emerged as a prominent technique for developing secure AI models. Advantages of Differential Privacy: Protection of Sensitive Data: DP allows for the protection of sensitive data by adding noise to the data, thus preserving individual privacy while still providing accurate results. Trade-off between Privacy and Utility: With DP, there is a trade-off

Advanced NLP for Diverse Languages

Developing natural language processing (NLP) pipelines for languages other than English remains a challenge. Since its release in 2015, spaCy has become one of the most popular open-source libraries for applied natural language processing in Python, enabling a wide range of applications across different use cases and domains. I this talk, I discuss spaCy’s philosophy for modern NLP, its extensible design and new recent features to enable the development of advanced natural language processing pipelines for typologically diverse languages. Source link

CPU Optimized Embeddings with 🤗 Optimum Intel and fastRAG

Embedding models are useful for many applications such as retrieval, reranking, clustering, and classification. The research community has witnessed significant advancements in recent years in embedding models, leading to substantial enhancements in all applications building on semantic representation. Models such as BGE, GTE, and E5 are placed at the top of the MTEB benchmark and in some cases outperform proprietary embedding services. There are a variety of model sizes found in Hugging Face’s Model hub, from lightweight (100-350M parameters) to 7B models (such as Salesforce/SFR-Embedding-Mistral). The lightweight models based on an encoder architecture are ideal candidates for optimization and utilization

8.6 Conhecendo mais métodos do Stream

1. Introdução aos Métodos do Stream O capítulo apresenta métodos adicionais da API Stream do Java 8. Nem todos os métodos são cobertos, mas são destacados os mais interessantes. Discutem-se boas práticas no uso de streams. 2. Trabalhando com Iterators🔹 Stream não implementa IterableSe tentarmos percorrer um Stream usando um for-each, ocorre erro de compilação: for (Usuario u : usuarios.stream()) { // Erro de compilação // ... } Enter fullscreen mode Exit fullscreen mode Isso ocorre porque Stream não é reutilizável: após ser consumido, ele não pode ser usado novamente. 🔹 Utilizando um Iterator para percorrer um Stream Iterator i

Predicting GitHub Tags · Explosion

One could learn how an oven works, but that doesn’t mean that you’ve learned how to cook. Similarly, one could understand the syntax of a machine learning tool, and still not be able to apply the technology in a meaningful way. That’s why in this blogpost I’d like to describe some topics that surround the creation of a spaCy project that isn’t directly related to syntax and instead relate more to “the act” of doing an NLP project in general. As an example use-case to focus on, we’ll be predicting tags for GitHub issues. The goal isn’t to discuss the