that's fine, but we actually need to see the results of how good it is. Having more open source reasoning models basically defeat OpenAI's business case.
that's fine, but we actually need to see the results of how good it is. Having more open source reasoning models basically defeat OpenAI's business case.
OpenAI next hat trick is probably going to be Test Time Training instead of just Test Time Compute....that's fine, but we actually need to see the results of how good it is. Having more open source reasoning models basically defeat OpenAI's business case.
Marco-o1 seems to be worse than Deepseek. Makes sense as it is a very low parameter model, 7Bthat's fine, but we actually need to see the results of how good it is. Having more open source reasoning models basically defeat OpenAI's business case.
Seems like a different division of Alibaba altogether, not associated with Qwen. Plus they havent done the RL part yet, so its not going to be strong at reasoning until they do it.Marco-o1 seems to be worse than Deepseek. Makes sense as it is a very low parameter model, 7B
To be more exact, it's a Qwen2-7B-Instruct fine-tune
But these new models, from what I can see, are not using o1-like reasoning/thinking techniques where they use inference time compute to "think".Sounds like to me everyone is now jumping the gun to announce something asap, even if not yet finished or still baking
Facebook scambling too
https://www.reddit.com/r/LocalLLaMA/comments/1gxxj4w
They might have some sort of what you ask internally and they released the small preliminary version to let the community test it out. Tbh if they release bigger versions, I am curious if they will allow full access to the thinking process as that would be extremely valuable data that would allow competitors to train on.Qwen needs to pump out a Q1 thats 72b in size, will definetly surpass o1-preview
Qwen3...ClosedAI is dragging feet and holding back. At this rate Qwen is going to do to o1 what Kling did to Sora
DeepSeek is not interested in making money from AI. They want to be attractive for prospective talent. They want to be attractive for college graduates. Rule of Cool. Otherwise why would a hft firm spend money on LLMs.But these new models, from what I can see, are not using o1-like reasoning/thinking techniques where they use inference time compute to "think".
More like Meta is throwing a bunch of stuff at the wall, and will wait to see which will stick...
They might have some sort of what you ask internally and they released the small preliminary version to let the community test it out. Tbh if they release bigger versions, I am curious if they will allow full access to the thinking process as that would be extremely valuable data that would allow competitors to train on.
Still a bit shocked that Deepseek shows it's full thinking process, let's see how the open source/weights model release will look though..
Qwen3...
As for OpenAI, the gap is closing. Great work by the Chinese AI labs. In fact, if not for OpenAI's existence, I would have said that China is leading the AI race rn
This is an uncensored qwen2.5 72b, third ranking on that leaderboard
The new 389B parameter by Tencent is quite a large model and probably takes quite a bit to run.
btw, I took at hugging face leaderboard today and it's still mostly Qwen-2.5 derivatives.