Artificial Intelligence thread

Overbom

Brigadier
Registered Member
Deepseek needs to put out a reason tuned version of v3 685b, surpass o1 and catch up with o3.
They already release one before but I was not super impressed tbh.

However, if they take this top model and make it with reasoning, then I would expect a huge leap. Open Source community is definitely eating good this and next year for sure
 

9dashline

Captain
Registered Member
They already release one before but I was not super impressed tbh.

However, if they take this top model and make it with reasoning, then I would expect a huge leap. Open Source community is definitely eating good this and next year for sure
Nope they never released the weights, only qwen did with qwq, and the r1 lite preview was 16b to 32b.... a 685b would be way better

so far they released the base and instruct tuned of 685b v3 but no model cards yet and no official announcememt. i hope they release a reasoning version today that surpasses o1
 

Overbom

Brigadier
Registered Member
Nope they never released the weights, only qwen did with qwq, and the r1 lite preview was 16b to 32b.... a 685b would be way better
a 685b reasoning model would actually put the data center on fire lol

But yeah, definitely smart that they made it MoE, looks like a nice architecture for reasoning given the huge token output need
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Actually insane how low the training costs were for DeepSeek V3. Trained on only 2000 H800 for around 2 months. Total cost: $5.6 million
btw, I would find that claim to be unlikely.

They already release one before but I was not super impressed tbh.

However, if they take this top model and make it with reasoning, then I would expect a huge leap. Open Source community is definitely eating good this and next year for sure

I'm sure they will get there. After all, these Chinese shops still just have less resources.
 

9dashline

Captain
Registered Member
Actually insane how low the training costs were for DeepSeek V3. Trained on only 2000 H800 for around 2 months. Total cost: $5.6 million
5 million was the cgi budget of Terminator 2.... so for the price of fake graphics in 1994, 3 decades later China can get quasi AGI
 
Top