Articles for category: AI Tools

Vision Language Models Explained

Vision language models are models that can learn simultaneously from images and texts to tackle many tasks, from visual question answering to image captioning. In this post, we go through the main building blocks of vision language models: have an overview, grasp how they work, figure out how to find the right model, how to use them for inference and how to easily fine-tune them with the new version of trl released today! What is a Vision Language Model? Vision language models are broadly defined as multimodal models that can learn from images and text. They are a type of

Thoughts on Functional Programming in Scala Course (Coursera)

At Lazada’s Data Science team, I use Spark a fair bit, especially when the data gets big (e.g., online behavioural and transaction data). While PySpark, the Python API for Spark was available when I started, I decided early on to code in Scala. Perhaps I relished the challenge or just wanted to pick up a new language. Why take the Functional Programming in Scala course? Before the course, my programming skills in Scala were mainly self taught, through the school of hard knocks and stackoverflow. Thus, when the course was made available on Coursera, I saw the opportunity to learn

How Does Minikube Handle Storage Volumes in 2025?

Minikube continues to be a vital tool for developers looking to create Kubernetes clusters locally. As of 2025, Minikube’s handling of storage volumes has evolved to enhance flexibility and performance. Understanding how Minikube manages storage is crucial for efficiently deploying and maintaining applications. This article delves into how Minikube manages storage volumes, offering insights into the new improvements and practices. Understanding Storage Volumes in Minikube Kubernetes employs volumes to manage persistent data; Minikube manages these volumes efficiently even in constrained local environments. In 2025, enhancements to Minikube have improved how volumes are created, assigned, and utilized across various nodes. Types

Coreference Resolution in spaCy

In everyday conversation, we use pronouns or other expressions to refer to entities in many different ways, but we effortlessly understand these references. In NLP this is a challenging problem known as Coreference Resolution. In this video, we’ll show how to train spaCy’s new component for Coreference Resolution and how to apply the pipeline to resolve references in a text. Source link

A Powerful 8B Vision-Language Model for the community

We are excited to release Idefics2, a general multimodal model that takes as input arbitrary sequences of texts and images, and generates text responses. It can answer questions about images, describe visual content, create stories grounded in multiple images, extract information from documents, and perform basic arithmetic operations. Idefics2 improves upon Idefics1: with 8B parameters, an open license (Apache 2.0), and enhanced OCR (Optical Character Recognition) capabilities, Idefics2 is a strong foundation for the community working on multimodality. Its performance on Visual Question Answering benchmarks is top of its class size, and competes with much larger models such as LLava-Next-34B

Product Classification API Part 1: Data Acquisition

To gain practice with building data products end-to-end, I recently developed a product classification API. The API helps classify products based on its title—instead of figuring out which category your product belongs to (out of thousands), you can provide the title and the API returns the top 3 most likely categories. (Github repositiory) Update: API discontinued to save on cloud cost. Input: Title. Output: Suggested categories. This is part of a series of posts on building a product classification API: Where did I get the product data from? I initially intended to build a web scraper to collect product data

Video streaming tutorial. Javascript webrtc streaming example 🌐LibreRemotePlay

Hi, in this post I will be covering only WebRTC simple setup for streaming using getDisplayMedia() (screen) but is also valid for getUserMedia() (camera, microphone). First if you are intereseted in more WebRTC explanations/tutorials this is part of a serie of WebRTC articles for JavaScript and Golang (This includes general knowledge about WebRTC). In second place I will introduce you the fundamentals about MediaChannels , what they are and how to use them in your webapp. In a Peerconnection we can create two types of channels, DataChannels (used for passing general purpose data between peers) and MediaChannels (for media data

AI Apps in a Flash with Gradio’s Reload Mode

In this post, I will show you how you can build a functional AI application quickly with Gradio’s reload mode. But before we get to that, I want to explain what reload mode does and why Gradio implements its own auto-reloading logic. If you are already familiar with Gradio and want to get to building, please skip to the third section. What Does Reload Mode Do? To put it simply, it pulls in the latest changes from your source files without restarting the Gradio server. If that does not make sense yet, please continue reading. Gradio is a popular Python

SortMySkills is now live!

SortMySkills is now live on Datagene.io! Check it out here. Update: API discontinued to save on cloud cost. A very simple UI so you can focus on the cards. What is SortMySkills? It is a card sorting game to help users discover their passion by sorting skills into what they like and dislike using. Users sort 50 general skills into a 5-scale rating, ranging from “Love using (energizes me)” to “Neutral (has little or no effect)” to “Hate using (depletes me)”. However, only five skills can be sorted into “Love using”, making you decide which skills you truly enjoy—and should