March 5, 2025
Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
[Submitted on 10 Oct 2024 (v1), last revised 2 Mar 2025 (this version, v2)] View a PDF of the paper titled MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization, by Yougang Lyu and 6 other authors View PDF HTML (experimental) Abstract:As large language models (LLMs) are rapidly advancing and achieving near-human capabilities on specific tasks, aligning them with human values is becoming more urgent. In scenarios where LLMs outperform humans, we face a weak-to-strong alignment problem where we need to effectively align strong student LLMs through weak supervision generated by weak teachers. Existing alignment methods mainly focus on strong-to-weak alignment