March 14, 2025
Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection]
In our paper, Understanding LLMs Requires More Than Statistical Generalization, we argue that current machine learning theory cannot explain the interesting emergent properties of Large Language Models, such as reasoning or in-context learning. From prior work (e.g., Liu et al., 2023) and our experiments, we’ve seen that these phenomena cannot be explained by reaching globally minimal test loss – the target of statistical generalization. In other words, model comparison based on the test loss is nearly meaningless. We identified three areas where more research is required: Understanding the role of inductive biases in LLM training, including the role of architecture,