yeah it is based on the STAR methodBTW in the last 18 months the papers on eliciting reasoning in LLM sprung up like mushrooms. Many on internet say that OpenAI o1 is based ...the other big players are months away from OpenAI, not years away
Many of the authors of that paper are at XAI(Elon's company) now
Qwen 2.5 has better performance due to more dataWhat is most impressive to me is that with a 70B model trained on 18T tokens they reached the performance of the 405B Llama model trained on 15T tokens.
better filtered, bigger corpus, and more extensive RLHF data annotation
If only they had a bigger cluster and more compute they would have crushed 4o, sonnet 3.5 as well