Articles for category: AI Research

Using AI to expand global access to reliable flood forecasts

Floods are the most common natural disaster, and are responsible for roughly $50 billion in annual financial damages worldwide. The rate of flood-related disasters has more than doubled since the year 2000 partly due to climate change. Nearly 1.5 billion people, making up 19% of the world’s population, are exposed to substantial risks from severe flood events. Upgrading early warning systems to make accurate and timely information accessible to these populations can save thousands of lives per year. Driven by the potential impact of reliable flood forecasting on people’s lives globally, we started our flood forecasting effort in 2017. Through

Advancing biomedical discovery: Overcoming data challenges in precision medicine

Introduction Modern biomedical research is driven by the promise of precision medicine—tailored treatments for individual patients through the integration of diverse, large-scale datasets. Yet, the journey from raw data to actionable insights is fraught with challenges. Our team of researchers at Microsoft Research in the Health Futures group, in collaboration with the Perelman School of Medicine at the University of Pennsylvania (opens in new tab), conducted an in-depth exploration of these challenges in a study published in Nature Scientific Reports. The goal of this research was to identify pain points in the biomedical data lifecycle and offer actionable recommendations to

Distill Hiatus

Over the past five years, Distill has supported authors in publishing artifacts that push beyond the traditional expectations of scientific papers. From Gabriel Goh’s interactive exposition of momentum, to an ongoing collaboration exploring self-organizing systems, to a community discussion of a highly debated paper, Distill has been a venue for authors to experiment in scientific communication. But over this time, the editorial team has become less certain whether it makes sense to run Distill as a journal, rather than encourage authors to self-publish. Running Distill as a journal creates a great deal of structural friction, making it hard for us

Towards Endless Tasks to Benchmark Memory Capabilities of Agents

Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents Marco Pleines, Matthias Pallasch, Frank Zimmer, Mike Preuss; 26(6):1−40, 2025. Abstract Memory Gym presents a suite of 2D partially observable environments, namely Mortar Mayhem, Mystery Path, and Searing Spotlights, designed to benchmark memory capabilities in decision-making agents. These environments, originally with finite tasks, are expanded into innovative, endless formats, mirroring the escalating challenges of cumulative memory games such as “I packed my bag”. This progression in task design shifts the focus from merely assessing sample efficiency to also probing the levels of memory effectiveness in dynamic, prolonged scenarios. To

Deep mutational learning for the selection of therapeutic antibodies resistant to the evolution of Omicron variants of SARS-CoV-2

Design and construction of a high-distance Omicron BA.1 RBD library A mutagenesis library was constructed based on BA.1, covering the entire 201 amino acid RBD region (positions 331–531 of SARS-CoV-2 S protein). To maximize the interrogated RBD sequence space, the library design was entirely synthetic and unbiased, as it did not consider evolutionary data or previous experimental findings. For the construction of the library, the RBD sequence was split into 11–12 fragments, each with an approximate length of 48 nucleotides (Supplementary Table 1). For a fragment of average length, 137 different single-stranded oligonucleotides (ssODN) were designed, where each ssODN had

Third-party evaluation to identify risks in LLMs’ training data

TLDR – EleutherAI and OpenMined conducted a demonstration project to show how third-party evaluators can query a non-public AI training dataset. This approach provides third-party evaluators with a new method for conducting AI safety evaluations without accessing the model or sensitive data. The Problem# With the rapid advancement of frontier artificial intelligence (AI) models, establishing effective third-party oversight and evaluation is crucial to ensure their responsible development and maintain public trust. However, many third-party oversight methods primarily rely on black-box access, in which evaluators can only query the system and observe its outputs. This degree of access severely restricts the

Stanford CRFM

In collaboration with SCBX and SCB 10X, we introduce the ThaiExam leaderboard. ThaiExam is a Thai language benchmark derived from standardized examinations in Thailand. It consists of assessments that evaluate general knowledge at the high school level, such as the ONET, TGAT, TPAT-1, and A-Level exams, as well as the IC exam, which assesses financial knowledge among investment professionals. The ThaiExam leaderboard is the first public leaderboard for language models on Thai language scenarios, and features evaluations of leading language models. Like all other HELM leaderboards, the ThaiExam leaderboard provides full prompt-level transparency, and the results can be fully reproduced

Updating the Frontier Safety Framework

Our next iteration of the FSF sets out stronger security protocols on the path to AGI AI is a powerful tool that is helping to unlock new breakthroughs and make significant progress on some of the biggest challenges of our time, from climate change to drug discovery. But as its development progresses, advanced capabilities may present new risks. That’s why we introduced the first iteration of our Frontier Safety Framework last year – a set of protocols to help us stay ahead of possible severe risks from powerful frontier AI models. Since then, we’ve collaborated with experts in industry, academia,

[2212.03683] Neighborhood Adaptive Estimators for Causal Inference under Network Interference

[Submitted on 7 Dec 2022 (v1), last revised 4 Mar 2025 (this version, v2)] View a PDF of the paper titled Neighborhood Adaptive Estimators for Causal Inference under Network Interference, by Alexandre Belloni and 1 other authors View PDF HTML (experimental) Abstract:Estimating causal effects has become an integral part of most applied fields. In this work we consider the violation of the classical no-interference assumption with units connected by a network. For tractability, we consider a known network that describes how interference may spread. Unlike previous work the radius (and intensity) of the interference experienced by a unit is unknown