New Qwen came out. The long-context score seems mediocre. You can find it at the very end of the table.
The benchmarks their team put out appeared to suggest huge improvements, even over K2. But K2 does better on long-context here. As always, labs will cherry-pick benchmarks to make their models appear amazing (we saw this with Grok 4 also).
More exciting to me is that the Qwen team has promised big new updates "soon". Personally, I want to see a new version of Qwen Max. We need a big model again, especially as Kimi has set the bar with their 1T-sized K2 model.