Articles for category: AI Research

Visualizing Weights

This article is part of the Circuits thread, an experimental format collecting invited short articles and critical commentary delving into the inner workings of neural networks. Curve Circuits Branch Specialization Introduction The problem of understanding a neural network is a little bit like reverse engineering a large compiled binary of a computer program. In this analogy, the weights of the neural network are the compiled assembly instructions. At the end of the day, the weights are the fundamental thing you want to understand: how does this sequence of convolutions and matrix multiplications give rise to model behavior? Trying to understand

The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond

The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond Jiin Woo, Gauri Joshi, Yuejie Chi; 26(26):1−85, 2025. Abstract In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone. Focusing on infinite-horizon tabular Markov decision processes, we provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Q-learning, which exhibit a linear speedup with respect to the number of agents and near-optimal dependencies on other salient problem parameters. In the asynchronous setting, existing analyses of federated Q-learning, which adopt an equally weighted

Chemical space exploration with quantum computing

Identifying new drug-like molecules for undruggable proteins — those that have proven impossible to target with conventional drugs — remains a challenge. The prevailing theory is that we have simply not explored the full chemical space to identify previously unseen molecules that could potentially target these proteins. Therefore, methods that can explore the vast chemical space (~1060 molecules) beyond the reach of conventional approaches are needed. A collaborative research group from labs around the world has published a hybrid quantum–classical generative model for small molecule design to target the KRAS protein, which has historically been resistant to drug discovery efforts,

Experiments in Weak-to-Strong Generalization | EleutherAI Blog

The EleutherAI interpretability team has been investigating weak-to-strong generalization in open-source models. In this post, we report some results on Qwen1.5 0.5B and Llama 3 8B. We observe consistent weak-to-strong generalization across 21 NLP datasets. We also investigate several other modifications to weak-to-strong training, with generally negative results: strong-to-strong training, modified loss functions, and several probe-based experiments. Among these, only the log-confidence auxiliary loss shows possible signs of consistently improving generalization. Introduction: Weak-to-Strong Generalization# In some circumstances, a stronger student model can outperform a weaker supervisor. Burns et al., 2024 demonstrate weak-to-strong generalization across several tasks and a range of

Stanford CRFM

As a first step towards holistic evaluation for VLMs, we have extended the HELM framework to evaluate vision-language models with new scenarios (datasets) and models. Visit our website for the prompts, raw model generations, and complete results. Vision-language models (VLMs), models that generate text given a hybrid text/visual prompt, have a wide range of use cases, including visual question-answering, text-driven image creation and alteration, image captioning, and robotics. However, our current understanding is limited by incomplete reporting — missing results for certain models on specific benchmarks and a notable absence of transparency regarding the prompting methodologies in the technical reports

Detecting Text Ghostwritten by Large Language Models – The Berkeley Artificial Intelligence Research Blog

The structure of Ghostbuster, our new state-of-the-art method for detecting AI-generated text. Large language models like ChatGPT write impressively well—so well, in fact, that they’ve become a problem. Students have begun using these models to ghostwrite assignments, leading some schools to ban ChatGPT. In addition, these models are also prone to producing text with factual errors, so wary readers may want to know if generative AI tools have been used to ghostwrite news articles or other sources before trusting them. What can teachers and consumers do? Existing tools to detect AI-generated text sometimes do poorly on data that differs from

Gemini Robotics brings AI into the physical world

Research Published 12 March 2025 Authors Carolina Parada Introducing Gemini Robotics, our Gemini 2.0-based model designed for robotics At Google DeepMind, we’ve been making progress in how our Gemini models solve complex problems through multimodal reasoning across text, images, audio and video. So far however, those abilities have been largely confined to the digital realm. In order for AI to be useful and helpful to people in the physical realm, they have to demonstrate “embodied” reasoning — the humanlike ability to comprehend and react to the world around us— as well as safely take action to get things done. Today,

[2411.05979] Variance-Aware Linear UCB with Deep Representation for Neural Contextual Bandits

[Submitted on 8 Nov 2024 (v1), last revised 11 Mar 2025 (this version, v2)] View a PDF of the paper titled Variance-Aware Linear UCB with Deep Representation for Neural Contextual Bandits, by Ha Manh Bui and 2 other authors View PDF Abstract:By leveraging the representation power of deep neural networks, neural upper confidence bound (UCB) algorithms have shown success in contextual bandits. To further balance the exploration and exploitation, we propose Neural-$\sigma^2$-LinearUCB, a variance-aware algorithm that utilizes $\sigma^2_t$, i.e., an upper bound of the reward noise variance at round $t$, to enhance the uncertainty quantification quality of the UCB, resulting

[2312.08177] Advanced Image Segmentation Techniques for Neural Activity Detection via C-fos Immediate Early Gene Expression

This paper has been withdrawn by Peilin Cai [Submitted on 13 Dec 2023 (v1), last revised 10 Mar 2025 (this version, v2)] View a PDF of the paper titled Advanced Image Segmentation Techniques for Neural Activity Detection via C-fos Immediate Early Gene Expression, by Peilin Cai No PDF available, click to view other formats Abstract:This paper investigates the application of advanced image segmentation techniques to analyze C-fos immediate early gene expression, a crucial marker for neural activity. Due to the complexity and high variability of neural circuits, accurate segmentation of C-fos images is paramount for the development of new insights

Multimodal Large Language Models are Shape-Blind

[Submitted on 21 Feb 2025 (v1), last revised 11 Mar 2025 (this version, v2)] View a PDF of the paper titled Forgotten Polygons: Multimodal Large Language Models are Shape-Blind, by William Rudman and 6 other authors View PDF HTML (experimental) Abstract:Despite strong performance on vision-language tasks, Multimodal Large Language Models (MLLMs) struggle with mathematical problem-solving, with both open-source and state-of-the-art models falling short of human performance on visual-math benchmarks. To systematically examine visual-mathematical reasoning in MLLMs, we (1) evaluate their understanding of geometric primitives, (2) test multi-step reasoning, and (3) explore a potential solution to improve visual reasoning capabilities. Our