Artificial Intelligence thread

european_guy

Junior Member
Registered Member

My commentary on Qwen-2.5-Max. I think they got a little unlucky here with DeepSeek beating them to the punch and stealing everyone's heart away. They may have a better model now than V3, but it came a month too late and without a comparable reasoning model that is super addictive.

This model has the potential to become world-class.

Please, Log in or Register to view URLs content!

It has been pre-trained on 20T tokens!!!

It is 20000 billions of tokens. For reference LLama 3 405B was trained on 15T tokens and was a record at that time. Maybe GPT-o1 and Google Gemini are also trained on similar token budgets but they don't disclose this important info.

After the pre-training phase you get what is called the base model. And their base model is already world class as we can see in this table of base models comparison (OpenAI, Anthropic and Google don't allow access to their base models). In particular it is stronger than DeepSeek V3 base, the base model of R1.


Qwen2.5-Max.jpeg


Pre-training is by far the most resource consuming phase of the training and it is when the model accumulates knowledge of the world.

Following phase called fine-tuning or post-training is much lighter from a resource budget POW, but is the key one to make all this knowledge come to fruition, to make all the good features of the model to emerge (including reasoning).

A strong base model will always develop, after post-training, into a strong instruct (i.e. finished) model.

So, now that the recipe for reasoning is out in the open (btw the recipe involves post-training, not pre-training) we can be very confident that this base model will evolve into a top class model better than R1 within 2/3 months.

Our base models have demonstrated significant advantages across most benchmarks, and we are optimistic that advancements in post-training techniques will elevate the next version of Qwen2.5-Max to new heights.
 
Last edited:

european_guy

Junior Member
Registered Member
This is the chief CTO of OpenAI (after Ilya Sutskever and many others were kicked out last year)


Now he reveals that they found the secret RL recipe to foster the emergence of thinking before DS. What DeepSeek found and published, was at the base of o1.

So, you can chose to keep confidential a very important piece of knowledge that otherwise would help the entire AI world to progress...it is a business decision, maybe not something people will applaud you, but understandable from a business POW.

But why disclose it now? Why tell everybody "I found it before! I just didn't tell anybody". Why speak only now that DS already published it openly?

This is very embarrassing and IMHO very sad from them: better to shut up and reflect on your choice.
 

ougoah

Brigadier
Registered Member
Can a (some) knowledgeable member(s) please give some insights into where Qwen, Pangu, and Bagualu are in the world of AI?

From what I understand and iirc, these three are all multimodal language models? From Alibaba, Huawei, and Tsinghua uni no less so not exactly poorly funded ventures. Pangu has been used by Europeans for weather forecasting outside of China. Qwen has been a pretty solid LLM in bench testing. Bagualu has over a trillion parameters.

What about the other giants? does Tencent have a language model?

Great tools but glad that Sam Altman recently mentioned that AGI is nowhere near a reality when responding to the AI fanboys who got hyped up by Sam A's xmas tweets.
 

siegecrossbow

General
Staff member
Super Moderator
This is the chief CTO of OpenAI (after Ilya Sutskever and many others were kicked out last year)


Now he reveals that they found the secret RL recipe to foster the emergence of thinking before DS. What DeepSeek found and published, was at the base of o1.

So, you can chose to keep confidential a very important piece of knowledge that otherwise would help the entire AI world to progress...it is a business decision, maybe not something people will applaud you, but understandable from a business POW.

But why disclose it now? Why tell everybody "I found it before! I just didn't tell anybody". Why speak only now that DS already published it openly?

This is very embarrassing and IMHO very sad from them: better to shut up and reflect on your choice.

CHAD stole NGAD’s ideas. This is why NGAD got delayed.
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Hailuo AI
"Introducing Hailuo T2V-01-Director Model: Control Your Camera Like a Pro!"
...
...
Hunyuan introduced Blender plugin for Hunyun3D 2.0
The links you have posted in the past 3 posts have mostly be posted before. Please check previous posts before just reposting everything again. You are being watched.
 

Temstar

Brigadier
Registered Member

here is a $6000 local setup that runs r1 that don't use any GPU

yeah, i don't buy the need for this many huge over priced data centers. Not when inference can become de centralized with open source models.
6-8 token per second though, usable for single user but a bit slow.

Are models mostly limited by size of of the ram or speed of the ram? Because if it's just a case of needing a lot of ram you could go for things like intel Optane?
Please, Log in or Register to view URLs content!
Then you wouldn't even need server boards and EPYC CPUs?
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
6-8 token per second though, usable for single user but a bit slow.

Are models mostly limited by size of of the ram or speed of the ram? Because if it's just a case of needing a lot of ram you could go for things like intel Optane?
Please, Log in or Register to view URLs content!
Then you wouldn't even need server boards and EPYC CPUs?
depending on your usage case, you can just kick it off to run in the background overnight with a bunch of prompts.

I'm sure you can spend more money to run faster.
 
Top