Articles for category: AI Tools

Using Machine Learning to Improve Language Metadata on the Hugging Face Hub

tl;dr: We’re using machine learning to detect the language of Hub datasets with no language metadata, and librarian-bots to make pull requests to add this metadata. The Hugging Face Hub has become the repository where the community shares machine learning models, datasets, and applications. As the number of datasets grows, metadata becomes increasingly important as a tool for finding the right resource for your use case. In this blog post, I’m excited to share some early experiments which seek to use machine learning to improve the metadata for datasets hosted on the Hugging Face Hub. Language Metadata for Datasets on

Weekly #13-2025: AI, React, PHPxTKY meetup and More

Madhu Sudhan Subedi Tech Weekly Tokenize Image as a Set is a new framework for image generation that uses set-based tokenization and a novel discrete diffusion method. The approach represents images as unordered token sets, enabling a unique and invertible generative modeling process. Link Perplexity AI envisions a TikTok that prioritizes deep content discovery and truth-seeking powered by an advanced answer engine. It plans to enhance the platform’s utility while maintaining its core function as a hub for creative expression. Link A major trend in 2025 is the increasing adoption of React Server Components (RSC) as a standard and React

ICLRandD/Blackstone: :black_circle: A spaCy pipeline and model for NLP on unstructured legal text.

Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales’ research lab, ICLR&D. Blackstone was written by Daniel Hoadley. Why are we building Blackstone? What’s special about Blackstone? Observations and other things worth noting Installation     Install the library     Install the Blackstone model About the model     The pipeline     Named-Entity Recogniser     Text categoriser Usage     Applying the NER model         Visualising entities     Applying the text categoriser model Custom pipeline extensions     Abbreviation and long-form definition resolution     Compound case reference detections     Legislation linker     Sentence segmenter Why

Towards Encrypted Large Language Models with FHE

Large Language Models (LLM) have recently been proven as reliable tools for improving productivity in many areas such as programming, content creation, text analysis, web search, and distance learning. The Impact of Large Language Models on Users’ Privacy Despite the appeal of LLMs, privacy concerns persist surrounding user queries that are processed by these models. On the one hand, leveraging the power of LLMs is desirable, but on the other hand, there is a risk of leaking sensitive information to the LLM service provider. In some areas, such as healthcare, finance, or law, this privacy risk is a showstopper. One

AI for Real-Time Collaboration in Web-Based DAWs

Introduction Digital Audio Workstations (DAWs) have revolutionized the music production industry, allowing artists to create, edit, and produce music digitally. However, traditional DAWs are often limited to offline use or require complex file-sharing workflows. Web-based DAWs have emerged as a solution, providing cloud-based music production environments. Integrating AI into these platforms can further enhance real-time collaboration, making music production more intuitive, efficient, and accessible. The Role of AI in Web-Based DAWs AI-powered features can significantly improve the real-time collaboration experience in web-based DAWs. These enhancements include: Intelligent Track Synchronization AI can analyze tempo, key, and time signatures in real-time to

Deploy MusicGen in no time with Inference Endpoints

MusicGen is a powerful music generation model that takes in text prompt and an optional melody to output music. This blog post will guide you through generating music with MusicGen using Inference Endpoints. Inference Endpoints allow us to write custom inference functions called custom handlers. These are particularly useful when a model is not supported out-of-the-box by the transformers high-level abstraction pipeline. transformers pipelines offer powerful abstractions to run inference with transformers-based models. Inference Endpoints leverage the pipeline API to easily deploy models with only a few clicks. However, Inference Endpoints can also be used to deploy models that don’t

How I Reduced My Oracle SQL Execution Time from 110s to 2s

🧩 The Problem Recently, I encountered a performance bottleneck while writing a SQL query: the execution time was over 110 seconds. Here’s a simplified version of the query: SELECT 'A' AS type, COUNT(adl.is_attend) AS count, ROUND(SUM(CASE WHEN adl.is_attend = 1 THEN 1 ELSE 0 END) / COUNT(adl.is_attend), 2) * 100 AS rate FROM attendance adl INNER JOIN ( SELECT deptId FROM department START WITH deptId = '...' CONNECT BY up_daptId = PRIOR deptId ) nbd ON adl.deptId = nbd.deptId WHERE TO_DATE(adl.date, 'yyyy-MM-dd') >= TO_DATE(#{begin_date}, 'yyyy-MM-dd') AND TO_DATE(adl.date, 'yyyy-MM-dd') <= TO_DATE(#{end_date}, 'yyyy-MM-dd') UNION ALL -- other queries with different deptId and

The Millennial Question

METHOD We used the Event Registry API to scrape news articles about Millennials. The query filtered on news articles with the word “Millennials”, “millennials”, “Millennial”, or “millennial” in the headline published between June 15, 2015 and June 15, 2019. This query yielded nearly 38,000 articles. We obtained article metadata, including the URL, title, body, and publishing date from the query. Sometimes, multiple news outlets in the same media family publish the same article; removing these duplicates yielded a total of 26,565 articles. We used the Spacy Python package to part-of-speech tag the headline text. Part-of-speech tagging identifies each word’s part-of-speech

Run On-Device LLMs in Apple Devices

I have a lot of respect for iOS/Mac developers. I started writing apps for iPhones in 2007, when not even APIs or documentation existed. The new devices adopted some unfamiliar decisions in the constraint space, with a combination of power, screen real estate, UI idioms, network access, persistence, and latency that was different to what we were used to before. Yet, this community soon managed to create top-notch applications that felt at home with the new paradigm. I believe that ML is a new way to build software, and I know that many Swift developers want to incorporate AI features