Articles for category: AI Tools

Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection]

In our paper, Understanding LLMs Requires More Than Statistical Generalization, we argue that current machine learning theory cannot explain the interesting emergent properties of Large Language Models, such as reasoning or in-context learning. From prior work (e.g., Liu et al., 2023) and our experiments, we’ve seen that these phenomena cannot be explained by reaching globally minimal test loss – the target of statistical generalization. In other words, model comparison based on the test loss is nearly meaningless. We identified three areas where more research is required: Understanding the role of inductive biases in LLM training, including the role of architecture,

Introducing Enhanced Agent Evaluation | Databricks Blog

Earlier this week, we announced new agent development capabilities on Databricks. After speaking with hundreds of customers, we’ve noticed two common challenges to advancing beyond pilot phases. First, customers lack confidence in their models’ production performance. Second, customers don’t have a clear path to iterate and improve. Together, these often lead to stalled projects or inefficient processes where teams scramble to find subject matter experts to manually assess model outputs. Today, we’re addressing these challenges by expanding Mosaic AI Agent Evaluation with new Public Preview capabilities. These enhancements help teams better understand and improve their GenAI applications through customizable, automated

📊 Prodigy Dashboard: beta testers wanted for new plugin

ines (Ines Montani) December 19, 2024, 11:20am 1 Hey everyone! We’re happy to introduce a new Prodigy plugin we’ve been working on that’s now available for beta testing. Prodigy Dashboard adds a new command dashboard that starts a web application for viewing annotations, data analytics, metrics and progress. It runs in your Prodigy environment and automatically connects to your database. At the moment, the dashboard is view-only, but we’re planning to add non-destructive editing for data as well. (This requires some deeper breaking changes to the database tables, though.) We also want to include more detailed inter-annotator agreement metrics, similar

Stable Diffusion 3.5 is here

We’re excited to announce that Stable Diffusion 3.5, the latest and most powerful text-to-image model from Stability AI, is now available on Replicate. It brings significant improvements in image quality, better prompt understanding, and supports a wide range of artistic styles. Stable Diffusion 3.5 comes in three variants: You can generate images using Stable Diffusion 3.5 right away. Try this in Python: import replicate output = replicate.run( "stability-ai/stable-diffusion-3.5-large", input={"prompt": "A watercolor painting of a futuristic city skyline at dawn"} ) print(output.url) Or use JavaScript: import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, }); const [output] = await

The Powerhouse GPU Revolutionizing Deep Learning

Introduction The rise of Large Language Models (LLMs) has marked a significant advancement in the era of Artificial Intelligence (AI). During this period, Cloud Graphic Processing Units (GPUs) offered by Paperspace + DigitalOcean have emerged as pioneers in providing high-quality NVIDIA GPUs, pushing the boundaries of computational technology. NVIDIA, was founded in 1993 by three visionary American computer scientists – Jen-Hsun (“Jensen”) Huang, former director at LSI Logic and microprocessor designer at AMD; Chris Malachowsky, an engineer at Sun Microsystems; and Curtis Priem, senior staff engineer and graphic chip designer at IBM and Sun Microsystems – embarked on its journey

Benchmarking Single Agent Performance

Over the past year, there has been growing excitement in the AI community around LLM-backed agents. What remains relatively unanswered and unstudied, is the question of “which agentic architectures are best for which use cases”. Can I use a single agent with access to a lot of tools, or should I try setting up a multi-agent architecture with clearer domains of responsibility? One of the most basic agentic architectures is the ReAct framework, which is what we’ll be exploring in this first series of experiments. In this study, we aim to answer the following question. At what point does a

Writing as a Way of Thinking

Was this newsletter forwarded to you? Sign up to get it in your inbox. I always love an article that confirms my priors, so I was particularly excited to read Paul Graham’s latest essay, “Writes and Write-Nots.” In case you missed the piece, he argues that writing is an extension of thinking. He believes that as a skill, writing well is already very unequally distributed. And in a world with AI, where your writing is instantly up-leveled by bots like ChatGPT, even fewer people will need to learn to write well. Therefore, even fewer people will think well—dividing the world

Scaling Laws – O1 Pro Architecture, Reasoning Training Infrastructure, Orion and Claude 3.5 Opus “Failures” – SemiAnalysis

In our pursuit of becoming a better full service research firm, we’ve moved off Substack. For any questions please read https://semianalysis.com/faq/#substack There has been an increasing amount of fear, uncertainty and doubt (FUD) regarding AI Scaling laws. A cavalcade of part-time AI industry prognosticators have latched on to any bearish narrative they can find, declaring the end of scaling laws that have driven the rapid improvement in Large Language Model (LLM) capabilities in the last few years. Journalists have joined the dogpile and have supported these narratives, armed with noisy leaks filled with vague information around the failure of models

A quote from Steve Klabnik

[…] in 2013, I did not understand that the things I said had meaning. I hate talking about this because it makes me seem more important than I am, but it’s also important to acknowledge. I saw myself at the time as just Steve, some random guy. If I say something on the internet, it’s like I’m talking to a friend in real life, my words are just random words and I’m human and whatever. It is what it is. But at that time in my life, that wasn’t actually the case. I was on the Rails team, I was

Release notes for Deephaven version 0.35

Deephaven Community Core version 0.35.0 was recently released. This release was the culmination of many big plans coming together. It includes a number of new features, improvements, breaking changes, and bug fixes. Without further ado, let’s dive in. Apache Iceberg integration​ We’ve been working on our Iceberg integration for a while now, and it’s finally here! Iceberg is a high-performance format for huge analytic tables, similar to Deephaven. The new interface allows you to get Iceberg namespaces, read Iceberg tables into Deephaven tables, get information on snapshots of Iceberg tables, and obtain all available tables in an Iceberg namespace. Below