Very disappointed on this release. Only noteworthy thing imo is the long context window. On everything else, it's behind the competitionLLama 4 is out
By their own admission, they took "inspiration" from Deepseek like for instance for the MoE (mixture of expert) model instead of using their classic "dense" model, but also introduced some novelty like and .
Their model is also native multi-mode (text, images, video) and this may explain the huge pretraining dataset of 30T tokens (about X2 compared to Qwen and DeepSeek). They pretrained on 32K GPUs.
LLama approach has always being: simple architecture + brute force...it is not a wrong idea when you have unlimited hardware. I hope Huawei and others will soon fill the GPU gap, so to allow Chinese labs to compete on almost equal footing.
Anyhow kudos to them for opensurcing the models. They are definetely the most open among US companies: Google opensoruces only their tier-3 models, and OpenAI...well, they are a joke regarding Open, at least Anthropic, the closest one, does not pretend.
Maybe the only saving grace would be when their behemoth big model completes training and then distilled down to a capable smaller one, and then add thinking on top
DeepSeek basically
But then why throw billions every year in AI research if all you do is just copy from other open-source projects? Hopefully after they catch up (?) they can innovate again