Articles for category: AI Research

March 14, 2025

Genie 2: A large-scale foundation world model

Acknowledgements Genie 2 was led by Jack Parker-Holder with technical leadership by Stephen Spencer, with key contributions from Philip Ball, Jake Bruce, Vibhavari Dasagi, Kristian Holsheimer, Christos Kaplanis, Alexandre Moufarek, Guy Scully, Jeremy Shar, Jimmy Shi and Jessica Yung, and contributions from Michael Dennis, Sultan Kenjeyev and Shangbang Long. Yusuf Aytar, Jeff Clune, Sander Dieleman, Doug Eck, Shlomi Fruchter, Raia Hadsell, Demis Hassabis, Georg Ostrovski, Pieter-Jan Kindermans, Nicolas Heess, Charles Blundell, Simon Osindero, Rushil Mistry gave advice. Past contributors include Ashley Edwards and Richie Steigerwald. The Generalist Agents team was led by Vlad Mnih with key contributions from Harris Chan,

March 14, 2025

ikayaniaamirshahzad@gmail.com

AI Research

[2409.19975] Exploiting Adjacent Similarity in Multi-Armed Bandit Tasks via Transfer of Reward Samples

[Submitted on 30 Sep 2024 (v1), last revised 12 Mar 2025 (this version, v2)] View a PDF of the paper titled Exploiting Adjacent Similarity in Multi-Armed Bandit Tasks via Transfer of Reward Samples, by NR Rahul and 1 other authors View PDF HTML (experimental) Abstract:We consider a sequential multi-task problem, where each task is modeled as the stochastic multi-armed bandit with K arms. We assume the bandit tasks are adjacently similar in the sense that the difference between the mean rewards of the arms for any two consecutive tasks is bounded by a parameter. We propose two algorithms (one assumes

March 14, 2025

ikayaniaamirshahzad@gmail.com

AI Research

Taming Audio-Conditioned Latent Diffusion Models for Lip Sync with SyncNet Supervision

[Submitted on 12 Dec 2024 (v1), last revised 13 Mar 2025 (this version, v2)] View a PDF of the paper titled LatentSync: Taming Audio-Conditioned Latent Diffusion Models for Lip Sync with SyncNet Supervision, by Chunyu Li and 8 other authors View PDF HTML (experimental) Abstract:End-to-end audio-conditioned latent diffusion models (LDMs) have been widely adopted for audio-driven portrait animation, demonstrating their effectiveness in generating lifelike and high-resolution talking videos. However, direct application of audio-conditioned LDMs to lip-synchronization (lip-sync) tasks results in suboptimal lip-sync accuracy. Through an in-depth analysis, we identified the underlying cause as the “shortcut learning problem”, wherein the model

March 14, 2025

ikayaniaamirshahzad@gmail.com

AI Research

Natural Language Descriptions for Expressive 3D Human Motions

[Submitted on 19 Dec 2023 (v1), last revised 12 Mar 2025 (this version, v3)] View a PDF of the paper titled MotionScript: Natural Language Descriptions for Expressive 3D Human Motions, by Payam Jome Yazdian and 5 other authors View PDF HTML (experimental) Abstract:We introduce MotionScript, a novel framework for generating highly detailed, natural language descriptions of 3D human motions. Unlike existing motion datasets that rely on broad action labels or generic captions, MotionScript provides fine-grained, structured descriptions that capture the full complexity of human movement including expressive actions (e.g., emotions, stylistic walking) and interactions beyond standard motion capture datasets. MotionScript

March 14, 2025

ikayaniaamirshahzad@gmail.com

AI Research

[2406.10714] Planning with Adaptive World Models for Autonomous Driving

[Submitted on 15 Jun 2024 (v1), last revised 12 Mar 2025 (this version, v3)] View a PDF of the paper titled Planning with Adaptive World Models for Autonomous Driving, by Arun Balajee Vasudevan and 3 other authors View PDF HTML (experimental) Abstract:Motion planning is crucial for safe navigation in complex urban environments. Historically, motion planners (MPs) have been evaluated with procedurally-generated simulators like CARLA. However, such synthetic benchmarks do not capture real-world multi-agent interactions. nuPlan, a recently released MP benchmark, addresses this limitation by augmenting real-world driving logs with closed-loop simulation logic, effectively turning the fixed dataset into a reactive

March 14, 2025

ikayaniaamirshahzad@gmail.com

AI Research

iOS features and AI Overviews

Every month, Google Lens is used for more than 20 billion visual searches. And now we’re introducing two updates for Lens that’ll make it even easier to search what you see, across more apps and devices. First, if you have an iPhone, you’ll find a new Lens option that lets you select and search what’s on your screen within Chrome or the Google app, using whatever gesture comes naturally — like drawing, highlighting or tapping. Whether you’re reading an article, shopping for a product or watching a video, you can use this feature to quickly perform a visual search while

March 14, 2025

ikayaniaamirshahzad@gmail.com

AI Research

Reddit – Heart of the internet

We value your privacy Reddit and its partners use cookies and similar technologies to provide you with a better experience. By accepting all cookies, you agree to our use of cookies to deliver and maintain our services and site, improve the quality of Reddit, personalize Reddit content and advertising, and measure the effectiveness of advertising. By rejecting non-essential cookies, Reddit may still use certain cookies to ensure the proper functionality of our platform. For more information, please see our Cookie Notice and our Privacy Policy. Source link

March 14, 2025

ikayaniaamirshahzad@gmail.com

AI Research