Artificial Intelligence thread

Overbom

Brigadier
Registered Member
LLama 4 is out

Please, Log in or Register to view URLs content!

By their own admission, they took "inspiration" from Deepseek like for instance for the MoE (mixture of expert) model instead of using their classic "dense" model, but also introduced some novelty like
Please, Log in or Register to view URLs content!
and
Please, Log in or Register to view URLs content!
.

Their model is also native multi-mode (text, images, video) and this may explain the huge pretraining dataset of 30T tokens (about X2 compared to Qwen and DeepSeek). They pretrained on 32K GPUs.

LLama approach has always being: simple architecture + brute force...it is not a wrong idea when you have unlimited hardware. I hope Huawei and others will soon fill the GPU gap, so to allow Chinese labs to compete on almost equal footing.

Anyhow kudos to them for opensurcing the models. They are definetely the most open among US companies: Google opensoruces only their tier-3 models, and OpenAI...well, they are a joke regarding Open, at least Anthropic, the closest one, does not pretend.
Very disappointed on this release. Only noteworthy thing imo is the long context window. On everything else, it's behind the competition

Maybe the only saving grace would be when their behemoth big model completes training and then distilled down to a capable smaller one, and then add thinking on top

DeepSeek basically


But then why throw billions every year in AI research if all you do is just copy from other open-source projects? Hopefully after they catch up (?) they can innovate again
 

european_guy

Junior Member
Registered Member
Very disappointed on this release. Only noteworthy thing imo is the long context window. On everything else, it's behind the competition

Maybe the only saving grace would be when their behemoth big model completes training and then distilled down to a capable smaller one, and then add thinking on top

DeepSeek basically


But then why throw billions every year in AI research if all you do is just copy from other open-source projects? Hopefully after they catch up (?) they can innovate again

Sorry to reply, but they are better than DS

This is a non reasoning model, so you have to compare apples with apples.

The Maverick model is a 400B parameters (less than 671 of DS) with 17B active (half of DS 39B), and it has comparable performance with DS V3.1 (the latest one, released just a couple weeks ago), but above DS it has also native multi-modality and 1M context length.

Going on the technical details, they have introduced interesting new ideas in "attention" layers so that they got a powerful model with faster speed for same hardware. Native multimodality is also a powerful feature and many will take inspiration from that. They have also introduced new ideas in training, but unfortunately these are less documented and cannot be revealed just looking at the sources.

On DeepSeek advantage there is a much deeper and detailed documentation and technical papers on all the aspect of the model, including training. DeepSeek is way more open than LLama4, but LLama 4 nevertheless is the top in US at the moment regarding openness.
 

Eventine

Junior Member
Registered Member
The version of Llama 4 they released to the public isn't performing up to par. Many people across the community are reporting bad results, even below QWQ 32B for the 400B model. We'll see if this is a parameters problem & gets fixed in the coming days.

As it is, in terms of pure performance, Google = Open AI > Anthropic >= Deep Seek > Grok 3 > Llama 4 currently, although Grok 3 has the benefit of being uncensored as previously stated.

All eyes are on Open AI's o3 full / o4-mini release later this month and Deep Seek's R2 release around the same time, to see if they can take back the crown from Google. Anthropic is basically still stuck in their coding niche and increasingly threatened by other players. Meta needs to correct course quickly or be in danger of dropping out of the race.

With more Big AI labs moving into multi modal models the compute requirements will likely increase; Open AI recently had to turn off their image generation for public access because it was taking too much GPU time.

Chinese players need to be cooperating more on optimizations; there’s a high compute wall for multi modal that needs another Deep Seek moment.
 
Last edited:

sunnymaxi

Major
Registered Member

luminary

Senior Member
Registered Member
well, the great thing about deepseek and Qwen making the model weights and RL process available is that anyone in the world can utilize their algo to create their own open source reasoning model. So that we the public have the power of controlling AI and that we are no longer just subservient to the tech overloads in Silicon Valley. Whose goal is to achieve global domination and techno feudalism over rest of us, who they want to rule over in their serfdom.
The true intentions of techno culture in SF bay area is getting a lot of attention recently.
Basically white supremacy and eugenics in a nerdy sanctimonious way:
https://www.reddit.com/r/SneerClub/comments/1jqouyl
 
Top