Articles for category: AI Tools

Visual Feature Learning Without Supervision

The field of computer vision is experiencing an increase in foundation models, similar to those in natural language processing (NLP). These models aim to produce general-purpose visual features that we can apply across various image distributions and tasks without the need for fine-tuning. The recent success of unsupervised learning in NLP pushed the way for similar advancements in computer vision. This article covers DINOv2, an approach that leverages self-supervised learning to generate robust visual features. Figure 1. DINOv2 principal component analysis visualization (source: https://github.com/facebookresearch/dinov2). The DINOv2 Framework In this section we will cover, various components of the DINOv2 framework including

Data Machina #253 – Data Machina

The Google AI Blast . This week OpenAI released a new closed model called GPT-4o (as in omni): Hello GPT-4o, a model that can reason across audio, vision, and text in real time. It seems the model performance in many benchmarks wasn’t as good as many AI pundits expected. And while many people in the AI community were befuddled and discussing the “flirtatiousness” aspects of GPT-4o, then Google came in and blasted a massive AI storm including SOTA models, new powerful open models, and pretty amazing tools. Here’s my summary on what Google released: Gemini 1.5 Pro model updates: Lots

AI This Week: New Agents, Open Models, and the Race for Productivity

Subscribe • Previous Issues Your AI Cheat Sheet: Manus, Gemma 3, OpenAI, and Global Regulations Feeling overwhelmed by the pace of AI developments? Our latest collection of guides and cheat sheets covers everything from China’s groundbreaking Manus agent to Google’s Gemma 3 and the evolving regulatory landscape across major economies. Catch up on a transformative week in AI with these accessible explainers. Manus: What You Need To Know Discover Manus, a groundbreaking AI agent from China’s Monica.ai, that’s redefining what AI assistants can do. Unlike chatbots that simply respond, Manus autonomously tackles complex tasks—from researching real estate to analyzing resumes—by navigating the

Building the Same App Using Various Web Frameworks

Recently, I’ve been wondering if I should migrate from my current web app stack (FastAPI, HTML, CSS, and a sprinkle of JavaScript) to a modern web framework. I was particularly interested in FastHTML, Next.js, and Svelte. FastHTML: Many folks have started building with it since Jeremy Howard launched it a month ago. Its goal is to enable modern web applications in pure Python. Next.js: I’ve come across several apps built with it such as cal.com and roomGPT. It has a large ecosystem and is popular for building production-grade web apps. SvelteKit: This lightweight framework has been popular with devs (Stack

Function Calling vs. Model Context Protocol (MCP): What You Need to Know

Integrating Large Language Models (LLMs) with external systems has transformed how businesses interact with technology. These models enable natural language inputs to control software, streamlining workflows and making operations more intuitive. However, integrating LLMs with external tools requires two key processes: Translating user prompts into structured function calls (Function Calling). Executing those function calls within an organized system (Model Context Protocol or MCP). Both Function Calling and MCP play essential roles in LLM-driven automation. While Function Calling focuses on converting natural language into action-ready commands, MCP ensures those commands are executed efficiently and consistently. Let’s break down their differences and

Content Drive — How we organize and share billions of files in Netflix studio | by Netflix Technology Blog

10 min read · Nov 25, 2024 by Esha Palta, Ankur Khetrapal, Shannon Heh, Isabell Lin, Shunfei Chen Netflix has pioneered the idea of a Studio in the Cloud, giving artists the ability to work from different corners of the world to create stories and assets to entertain the world. Starting at the point of ingestion where data is produced out of the camera, it goes through many stages, some of which are shown below. The media undergoes comprehensive backup routines at every stage and phase of this process with frequent uploads and downloads. In order to support these processes

What Is AIaaS? Artificial intelligence as a service explained – BMC Software

Today, almost all companies are using at least one type of as a service offerings as a way to focus on their core business and outsource other needs to third-party experts and vendors. Though software as a service has the largest global spend —$105 million spent in 2020 alone—IaaS and PaaS are expected to grow faster in the coming years. Now the same as a service approach is being applied to a new field: AIaaS. AIaaS is short for Artificial Intelligence-as-a-service. The term and the product are on the rise, and we’re digging into what AIaaS means in this article.

Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments

Takeaways LoRA is one of the most widely used, parameter-efficient finetuning techniques for training custom LLMs. From saving memory with QLoRA to selecting the optimal LoRA settings, this article provides practical insights for those interested in applying it.   Introduction: Getting the Most out of LoRA I’ve run hundreds, if not thousands, of experiments involving LoRA over the past few months. A few weeks ago, I took the time to delve deeper into some of the hyperparameter choices. This is more of an experimental diary presented in sequential order. I hope it proves useful to some. Specifically, I aim to