![]() |
Cornell Tech and Stanford University researchers introduced **Block Discrete Denoising Diffusion Language Models (BD3-LMs)** to overcome these limitations. This new class of models interpolates between autoregressive and diffusion models by employing a structured approach that supports variable-length generation while maintaining inference efficiency. BD3-LMs use key-value caching and parallel token sampling to reduce computational overhead. The model is designed with specialized training algorithms that minimize gradient variance through customized noise schedules, optimizing performance across diverse language modeling benchmarks. BD3-LMs operate by structuring text generation into blocks rather than individual tokens. Unlike traditional autoregressive models, which predict the next token sequentially, BD3-LMs generate a block of tokens simultaneously, significantly improving efficiency. A diffusion-based denoising process within each block ensures high-quality text generation while preserving coherence. The model architecture integrates transformers with a block-causal attention mechanism, allowing each block to condition on previously generated blocks. This approach enhances both contextual relevance and fluency. The training process includes a vectorized implementation that enables parallel computations, reducing training time and resource consumption. Researchers introduced data-driven noise schedules that stabilize training and improve gradient estimation to address the high variance issue in diffusion models……. Paper: https://arxiv.org/abs/2503.09573 GitHub Page: https://github.com/kuleshov-group/bd3lms Project: https://m-arriola.com/bd3lms/ submitted by /u/ai-lover |