Artificial Intelligence thread

tphuang · Nov 23, 2024

9dashline said:
Now after Deepseek, Alibaba jumping on this too

Please, Log in or Register to view URLs content!

that's fine, but we actually need to see the results of how good it is. Having more open source reasoning models basically defeat OpenAI's business case.

9dashline · Nov 23, 2024

tphuang said:
that's fine, but we actually need to see the results of how good it is. Having more open source reasoning models basically defeat OpenAI's business case.

OpenAI next hat trick is probably going to be Test Time Training instead of just Test Time Compute....

In order words, dynamic per-instance real time fine tuning....

Please, Log in or Register to view URLs content!

Overbom · Nov 24, 2024

tphuang said:
that's fine, but we actually need to see the results of how good it is. Having more open source reasoning models basically defeat OpenAI's business case.

Marco-o1 seems to be worse than Deepseek. Makes sense as it is a very low parameter model, 7B

To be more exact, it's a Qwen2-7B-Instruct fine-tune

9dashline · Nov 24, 2024

Overbom said:
Marco-o1 seems to be worse than Deepseek. Makes sense as it is a very low parameter model, 7B

To be more exact, it's a Qwen2-7B-Instruct fine-tune

Seems like a different division of Alibaba altogether, not associated with Qwen. Plus they havent done the RL part yet, so its not going to be strong at reasoning until they do it.

Sounds like to me everyone is now jumping the gun to announce something asap, even if not yet finished or still baking

Facebook scambling too

https://www.reddit.com/r/LocalLLaMA/comments/1gxxj4w

9dashline · Nov 24, 2024

Qwen needs to pump out a Q1 thats 72b in size, will definetly surpass o1-preview

ClosedAI is dragging feet and holding back. At this rate Qwen is going to do to o1 what Kling did to Sora

Overbom · Nov 24, 2024

9dashline said:
Sounds like to me everyone is now jumping the gun to announce something asap, even if not yet finished or still baking

Facebook scambling too

https://www.reddit.com/r/LocalLLaMA/comments/1gxxj4w

But these new models, from what I can see, are not using o1-like reasoning/thinking techniques where they use inference time compute to "think".

More like Meta is throwing a bunch of stuff at the wall, and will wait to see which will stick...

9dashline said:
Qwen needs to pump out a Q1 thats 72b in size, will definetly surpass o1-preview

They might have some sort of what you ask internally and they released the small preliminary version to let the community test it out. Tbh if they release bigger versions, I am curious if they will allow full access to the thinking process as that would be extremely valuable data that would allow competitors to train on.

Still a bit shocked that Deepseek shows it's full thinking process, let's see how the open source/weights model release will look though..

9dashline said:
ClosedAI is dragging feet and holding back. At this rate Qwen is going to do to o1 what Kling did to Sora

Qwen3...
As for OpenAI, the gap is closing. Great work by the Chinese AI labs. In fact, if not for OpenAI's existence, I would have said that China is leading the AI race rn

Hyper · Nov 24, 2024

Overbom said:
But these new models, from what I can see, are not using o1-like reasoning/thinking techniques where they use inference time compute to "think".

More like Meta is throwing a bunch of stuff at the wall, and will wait to see which will stick...

They might have some sort of what you ask internally and they released the small preliminary version to let the community test it out. Tbh if they release bigger versions, I am curious if they will allow full access to the thinking process as that would be extremely valuable data that would allow competitors to train on.

Still a bit shocked that Deepseek shows it's full thinking process, let's see how the open source/weights model release will look though..

Qwen3...
As for OpenAI, the gap is closing. Great work by the Chinese AI labs. In fact, if not for OpenAI's existence, I would have said that China is leading the AI race rn

DeepSeek is not interested in making money from AI. They want to be attractive for prospective talent. They want to be attractive for college graduates. Rule of Cool. Otherwise why would a hft firm spend money on LLMs.

tphuang · Nov 24, 2024

https://twitter.com/i/web/status/1860404479671357473

The new 389B parameter by Tencent is quite a large model and probably takes quite a bit to run.

btw, I took at hugging face leaderboard today and it's still mostly Qwen-2.5 derivatives.

Please, Log in or Register to view URLs content!

9dashline · Nov 24, 2024

tphuang said:
https://twitter.com/i/web/status/1860404479671357473

The new 389B parameter by Tencent is quite a large model and probably takes quite a bit to run.

btw, I took at hugging face leaderboard today and it's still mostly Qwen-2.5 derivatives.

Please, Log in or Register to view URLs content!

This is an uncensored qwen2.5 72b, third ranking on that leaderboard

Please, Log in or Register to view URLs content!

tphuang · Nov 25, 2024

Please, Log in or Register to view URLs content!

China's largest data center in the middle of the ground has opened up. Currently just has 2000PFLOPS, but will have 10000P by end of this year. First phase will conclude with 30000P (30EFLOPS)
Eventually, this data center will get to 100EFLOPS, filling a major computing blank in the middle of the country.

Artificial Intelligence thread

tphuang

General

9dashline

Captain

Overbom

Brigadier

9dashline

Captain

9dashline

Captain

Overbom

Brigadier

Hyper

Junior Member

tphuang

General

9dashline

Captain

tphuang

General