The gap between Closed Source and Open Source is about 7 months.
View attachment 166706
It was unchanged during the year. Closed source didn't increase their lead but open source didn't close it either. If trends hold, then we will likely see a Gemini 3-type Open Source model during summer next year. At the very least during autumn. Exciting.
This benchmark is known to be biased. It's supposedly closed but they straight up give the questions to OpenAI since they're the biggest investors.
They also purposely don't call the official APIs for Chinese models and call 3rd party ones to prevent leakages. The problem is that 3rd party endpoints may not be the most optimized vs. official ones. However, they have no issues calling the official APIs for US models, so their questions can still potentially leak to American labs who can then benchmaxx.
Also they test using the highest, most expensive versions of all the American models which are optimized for these benchmarks at huge cost but do not test Deepseek-V3.2-Speciale which would be the Chinese equivalent.