March 13, 2025
NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation
The Transformer architecture, introduced by Vaswani et al. in 2017, serves as the backbone of contemporary language models. Over the years, numerous modifications to this architecture have been proposed to enhance aspects such as training stability, inference efficiency, context length, and robustness. In a new paper nGPT: Normalized Transformer with Representation Learning on the Hypersphere, an NVIDIA research team proposes the normalized Transformer (nGPT), which consolidates key findings in Transformer research under a unified framework, offering faster learning and reduced training steps—by factors ranging from 4 to 20 depending on sequence length. The researchers summarize their main contributions as follows: