![]() |
This benchmark is a more challenging version of the original NYT Connections benchmark (which was approaching saturation and required identifying only three categories, allowing the fourth to fall into place), with additional words added to each puzzle. To safeguard against training data contamination, I also evaluate performance exclusively on the most recent 100 puzzles. In this scenario, o1-pro remains in first place.More info: GitHub: NYT Connections Benchmark submitted by /u/zero0_one1 |