Artificial Intelligence thread

Biscuits

Colonel
Registered Member
it is funny to see the newer Chinese open source models now that compare themselves to DeepSeek first before Llama and GPT-4O.

Guys, show some self confidence. It's absolutely pathetic of Chinese culture to only be proud of something only once it's validated in Western countries. The level of discourse on Chinese social media around DeepSeek in the past 2 days is several magnitude higher than even last week.
It's just normal to compare to state of the art first and foremost no?

I don't see any US based models comparing to Lucie AI. Why are they being so pathetic to not be proud and directly compare to French AI and instead validate by comparing to O1 or Deepseek?
 

Overbom

Brigadier
Registered Member
it is funny to see the newer Chinese open source models now that compare themselves to DeepSeek first before Llama and GPT-4O.

Guys, show some self confidence. It's absolutely pathetic of Chinese culture to only be proud of something only once it's validated in Western countries. The level of discourse on Chinese social media around DeepSeek in the past 2 days is several magnitude higher than even last week.
I think comparing to SOTA models is normal and only deserved for the very best.

Best way of showing improvements is comparing to current SOTA. If Qwen came out and compared to Mistral or whatever, it would be ridiculed. First you set the stage (closed source) then you compare against the top at that (+ open source) field. If you are the best, good. If you are not, trash unless you offer something really good on something else (computational requirements/efficiency, unique skills on specific things, top model for specific major country etc)

Closed source models are expected to beat everything, whereas open source models are expected to beat other open source models. However, if an open source model makes the jump across the pond and beats closed source models as well, then that is a really huge deal. Couple that with also such huge test-compute efficiency gains, and is a bombshell in the industry
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
I think comparing to SOTA models is normal and only deserved for the very best.

Best way of showing improvements is comparing to current SOTA. If Qwen came out and compared to Mistral or whatever, it would be ridiculed. First you set the stage (closed source) then you compare against the top at that (+ open source) field. If you are the best, good. If you are not, trash unless you offer something really good on something else (computational requirements/efficiency, unique skills on specific things, top model for specific major country etc)

Closed source models are expected to beat everything, whereas open source models are expected to beat other open source models. However, if an open source model makes the jump across the pond and beats closed source models as well, then that is a really huge deal. Couple that with also such huge test-compute efficiency gains, and is a bombshell in the industry
I mean for a while now, There have been competitive SOTA Chinese models across several AI areas and this putting deepseek into their comparison is just happening now.
 

european_guy

Junior Member
Registered Member

The entire Huawei and Ascend situation with DeepSeek is just utterly embarrassing. How can they not even be bothered to have been working with DeepSeek before this. Read the photo, DeepSeek-V3 is about to come online!

V3 has been out for a month. And they are only bringing the integration out now when everyone on Chinese social media is talking about DeepSeek.

Huawei should be ashamed of themselves.

Saw some insanely fast times for Groq running it.

YouTuber Matt Wolfe using Groq had a 275 t/s inference speed

Whereas DeepSeek API speed was 15 t/s (when I checked a few days ago from OpenRoute). Edit: Didn't remember correctly, that speed was about the full R1 model

And I agree, I think inference is much more open for disruption on Nvidia. A lot less moat

Probably you already know, but I'd like to clarify that the so called distilled versions of DS, are nothing more than the originals Qwen and LLama with the weights (the parameters) values modified, but the architecture remains 100% the original Qwen or LLama, so if Huawei / Groq already had f.i. a LLama implementation (and they had of course), they also support the DS distilled LLama out of the box.

Instead supporting the original DS R1 is another story. As already mentioned, DS introduced many architectural changes compared to LLama (Qwen is 95% LLama from architecture point of view).

Of course the modified weights make the distilled versions to perform much better than the originals, but from the hardware/LLM architecture POW nothing changed.

IMHO the reason why nor LLama 4 nor an improved QwQ reasoning model came out in these weeks (although both were rumored to come out) could be just because of the distilled DS versions: imagine releasing a LLama 4 weaker than the same LLama model but distilled out of DS! That would be very embarrassing.
 
Last edited:

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Probably you already know, but I'd like to clarify that the so called distilled versions of DS, are nothing more than the originals Qwen and LLama with the weights (the parameters) values modified, but the architecture remains 100% the original Qwen or LLama, so if Huawei / Groq already had f.i. a LLama implementation (and they had of course), they also support the DS distilled LLama out of the box.

Instead supporting the original DS R1 is another story. As already mentioned, DS introduced many architectural changes compared to LLama (Qwen is 95% LLama from architecture point of view).

Of course the modified weights make the distilled versions to perform much better than the originals, but from the hardware/LLM architecture POW nothing changed.

IMHO the reason why nor LLama 4 nor an improved QwQ reasoning model came out in these weeks (although both were rumored to come out) could be just because of the distilled DS versions: imagine releasing a LLama 4 weaker than the same LLama model but distilled out of DS! That would be very embarrassing.
well, distilled R1 on these models still have all the reasoning/thinking stuff that the foundational models don't have. I just ran a query on my local 7B version right now and it spewed a lot of thinking token before it gave me the answer.

So I'm not convinced that they would be able to support these out of the box.
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member

My commentary on Qwen-2.5-Max. I think they got a little unlucky here with DeepSeek beating them to the punch and stealing everyone's heart away. They may have a better model now than V3, but it came a month too late and without a comparable reasoning model that is super addictive.
 

subotai1

Junior Member
Registered Member
I have notice something. Due Think Tank national security crap and greed has lead to the closeness of US AI research and AI models, is making US universities to become more and more reliant on China Open Source Models to do their research. Is very ironic to say the least.
And with the recent funding and grant freezes in the US (which may or may not stand), university researchers will be even more cost conscious and likely to use DS and other models.
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member

Well, this is an update from the CTO of hyperbolic who I assume knows a few things.

Apparently, DeepSeek V3 is designed to work with inference on Ascend-910C from Day 1.

Inference performance on 910C is 60% of H100 and could be higher with optimization. That seems to me just things Huawei can improve on pretty quickly.

That would mean the stuff about Huawei cloud not being ready for DeepSeek is purely a Huawei cloud issue. Because DeepSeek made itself compatible with Ascend, as they should!
 

Fatty

Junior Member
Registered Member

Well, this is an update from the CTO of hyperbolic who I assume knows a few things.

Apparently, DeepSeek V3 is designed to work with inference on Ascend-910C from Day 1.

Inference performance on 910C is 60% of H100 and could be higher with optimization. That seems to me just things Huawei can improve on pretty quickly.

That would mean the stuff about Huawei cloud not being ready for DeepSeek is purely a Huawei cloud issue. Because DeepSeek made itself compatible with Ascend, as they should!


AMD is making Deepseek compatible too.
Inference is the largest expenditure, so this may put a large crack in Nvidia’s moat
 
Top