Articles for category: AI Research

Knowledge-guided diffusion model for 3D ligand-pharmacophore mapping

Dataset construction We constructed two 3D ligand-pharmacophore pair datasets, CpxPhoreSet and LigPhoreSet, for LPM learning, by using the enhanced version of AncPhore23. CpxPhoreSet was established by analyzing a total of 19,443 protein-ligand complex structures collected in PDBBind (version 2020)47,48. We followed a time-split scheme14 and divided PDBBind into train (16,379 entries), validation (968 entries), and test (363 entries) sets. The train and validation set were used to establish the CpxPhoreSet and the remaining test set was used for performance evaluation. For each complex structure, AncPhore was used to generate one pharmacophore model considering 10 pharmacophore feature types (HD, HA, MB,

Mechanistic Anomaly Detection Research Update 2

Previously we discussed our progress in testing some approaches to mechanistic anomaly detection (MAD). This is a short update on progress since then. We found anomaly detection performance on non-arithmetic tasks was much worse for Llama 3.1 8B trained in the same way as Mistral 7B v0.1, the model that we were using previously. When anomaly detection did not work well, we found Llama was somewhat less quirky than Mistral, but it still exhibited the desired quirky behaviour and achieved lower loss on average across the tasks. We found that the distance between the centroids of Alice and Bob contexts

Stanford CRFM

We introduce Cybench, a benchmark consisting of 40 cybersecurity tasks from professional CTF competitions. Key Takeaways The impact of cybersecurity agents will continue to expand with increasing language model capabilities. They have the potential to not only identify vulnerabilities but also execute exploits. We introduce a benchmark to quantify the capabilities and risks of cybersecurity agents, with 40 professional-level CTF tasks that are recent, meaningful, and spanning a wide range of difficulties. As many tasks are beyond current agent capabilities, we introduce the concept of subtasks, which break down a task into individual steps, for more gradated evaluation. We develop

The Visual Haystacks Benchmark! – The Berkeley Artificial Intelligence Research Blog

Humans excel at processing vast arrays of visual information, a skill that is crucial for achieving artificial general intelligence (AGI). Over the decades, AI researchers have developed Visual Question Answering (VQA) systems to interpret scenes within single images and answer related questions. While recent advancements in foundation models have significantly closed the gap between human and machine visual processing, conventional VQA has been restricted to reason about only single images at a time rather than whole collections of visual data. This limitation poses challenges in more complex scenarios. Take, for example, the challenges of discerning patterns in collections of medical

FACTS Grounding: A new benchmark for evaluating the factuality of large language models

Responsibility & Safety Published 17 December 2024 Authors FACTS team Our comprehensive benchmark and online leaderboard offer a much-needed measure of how accurately LLMs ground their responses in provided source material and avoid hallucinations Large language models (LLMs) are transforming how we access information, yet their grip on factual accuracy remains imperfect. They can “hallucinate” false information, particularly when given complex inputs. In turn, this can erode trust in LLMs and limit their applications in the real world. Today, we’re introducing FACTS Grounding, a comprehensive benchmark for evaluating the ability of LLMs to generate responses that are not only factually

Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency

[Submitted on 7 Oct 2024 (v1), last revised 5 Mar 2025 (this version, v2)] View a PDF of the paper titled From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency, by Kaiyue Wen and 3 other authors View PDF HTML (experimental) Abstract:Chain-of-thought (CoT) significantly enhances the reasoning performance of large language models (LLM). While current theoretical studies often attribute this improvement to increased expressiveness and computational capacity, we argue that expressiveness is not the primary limitation in the LLM regime, as current large models will fail on simple tasks. Using a parity-learning setup, we demonstrate that

[2405.17859] Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation

[Submitted on 28 May 2024 (v1), last revised 5 Mar 2025 (this version, v3)] View a PDF of the paper titled Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation, by Yangxiao Lu and 4 other authors View PDF HTML (experimental) Abstract:Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified, simple, yet effective framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we

[2409.07402] What to align in multimodal contrastive learning?

[Submitted on 11 Sep 2024 (v1), last revised 5 Mar 2025 (this version, v2)] View a PDF of the paper titled What to align in multimodal contrastive learning?, by Benoit Dufumier and 3 other authors View PDF HTML (experimental) Abstract:Humans perceive the world through multisensory integration, blending the information of different modalities to adapt their behavior. Contrastive learning offers an appealing solution for multimodal self-supervised learning. Indeed, by considering each modality as a different view of the same entity, it learns to align features of different modalities in a shared representation space. However, this approach is intrinsically limited as it

[2411.15684] Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing

[Submitted on 24 Nov 2024 (v1), last revised 5 Mar 2025 (this version, v2)] View a PDF of the paper titled Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing, by Zheng Ma and 7 other authors View PDF HTML (experimental) Abstract:Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying

Expanding AI Overviews and introducing AI Mode

Helping people discover content from the web remains central to our approach, and with AI Mode we’re making it easy for people to explore and take action. With the model’s deep information retrieval, people can better express what they’re looking for — with all their nuances and constraints — and get to the right web content in a range of formats. Testing in Labs We’ve been getting feedback internally and from trusted testers, and they’ve found AI Mode incredibly helpful– they particularly appreciate the speed, quality and freshness of responses. Now, we’re expanding our testing with a limited, opt-in experience