Artificial Intelligence thread

Overbom · Apr 6, 2025

european_guy said:
LLama 4 is out

Please, Log in or Register to view URLs content!

By their own admission, they took "inspiration" from Deepseek like for instance for the MoE (mixture of expert) model instead of using their classic "dense" model, but also introduced some novelty like
Please, Log in or Register to view URLs content!
and
Please, Log in or Register to view URLs content!
.

Their model is also native multi-mode (text, images, video) and this may explain the huge pretraining dataset of 30T tokens (about X2 compared to Qwen and DeepSeek). They pretrained on 32K GPUs.

LLama approach has always being: simple architecture + brute force...it is not a wrong idea when you have unlimited hardware. I hope Huawei and others will soon fill the GPU gap, so to allow Chinese labs to compete on almost equal footing.

Anyhow kudos to them for opensurcing the models. They are definetely the most open among US companies: Google opensoruces only their tier-3 models, and OpenAI...well, they are a joke regarding Open, at least Anthropic, the closest one, does not pretend.

Very disappointed on this release. Only noteworthy thing imo is the long context window. On everything else, it's behind the competition

Maybe the only saving grace would be when their behemoth big model completes training and then distilled down to a capable smaller one, and then add thinking on top

DeepSeek basically

But then why throw billions every year in AI research if all you do is just copy from other open-source projects? Hopefully after they catch up (?) they can innovate again

european_guy · Apr 6, 2025

Overbom said:
Very disappointed on this release. Only noteworthy thing imo is the long context window. On everything else, it's behind the competition

Maybe the only saving grace would be when their behemoth big model completes training and then distilled down to a capable smaller one, and then add thinking on top

DeepSeek basically

But then why throw billions every year in AI research if all you do is just copy from other open-source projects? Hopefully after they catch up (?) they can innovate again

Sorry to reply, but they are better than DS

This is a non reasoning model, so you have to compare apples with apples.

The Maverick model is a 400B parameters (less than 671 of DS) with 17B active (half of DS 39B), and it has comparable performance with DS V3.1 (the latest one, released just a couple weeks ago), but above DS it has also native multi-modality and 1M context length.

Going on the technical details, they have introduced interesting new ideas in "attention" layers so that they got a powerful model with faster speed for same hardware. Native multimodality is also a powerful feature and many will take inspiration from that. They have also introduced new ideas in training, but unfortunately these are less documented and cannot be revealed just looking at the sources.

On DeepSeek advantage there is a much deeper and detailed documentation and technical papers on all the aspect of the model, including training. DeepSeek is way more open than LLama4, but LLama 4 nevertheless is the top in US at the moment regarding openness.

Eventine · Apr 6, 2025

The version of Llama 4 they released to the public isn't performing up to par. Many people across the community are reporting bad results, even below QWQ 32B for the 400B model. We'll see if this is a parameters problem & gets fixed in the coming days.

As it is, in terms of pure performance, Google = Open AI > Anthropic >= Deep Seek > Grok 3 > Llama 4 currently, although Grok 3 has the benefit of being uncensored as previously stated.

All eyes are on Open AI's o3 full / o4-mini release later this month and Deep Seek's R2 release around the same time, to see if they can take back the crown from Google. Anthropic is basically still stuck in their coding niche and increasingly threatened by other players. Meta needs to correct course quickly or be in danger of dropping out of the race.

With more Big AI labs moving into multi modal models the compute requirements will likely increase; Open AI recently had to turn off their image generation for public access because it was taking too much GPU time.

Chinese players need to be cooperating more on optimizations; there’s a high compute wall for multi modal that needs another Deep Seek moment.

supercat · Apr 6, 2025

China has a lot of AI talent. Remember, a lot of Ph.D in AI from the US are probably awarded to international students from China.

https://twitter.com/i/web/status/1908768565832147180

sunnymaxi · Apr 6, 2025

supercat said:
China has a lot of AI talent. Remember, a lot of Ph.D in AI from the US are probably awarded to international students from China.

https://twitter.com/i/web/status/1908768565832147180

i have always said one thing, China has the largest talent pool in human history and this is going to be even better in coming years.. LOOL

https://twitter.com/i/web/status/1908746287069188483

luminary · Apr 7, 2025

tphuang said:
well, the great thing about deepseek and Qwen making the model weights and RL process available is that anyone in the world can utilize their algo to create their own open source reasoning model. So that we the public have the power of controlling AI and that we are no longer just subservient to the tech overloads in Silicon Valley. Whose goal is to achieve global domination and techno feudalism over rest of us, who they want to rule over in their serfdom.

The true intentions of techno culture in SF bay area is getting a lot of attention recently.

Basically white supremacy and eugenics in a nerdy sanctimonious way:
https://www.reddit.com/r/SneerClub/comments/1jqouyl

Hyper · Apr 7, 2025

Please, Log in or Register to view URLs content!

Teamblind rumours prices true. This was a broken model from the start. These rumours started in January.

european_guy · Apr 10, 2025

https://twitter.com/i/web/status/1910373327832498242

Here is the

Please, Log in or Register to view URLs content!

Smaller than DS R1 with 20B active parameters (half than R1) and 200B in total(1/3 of R1), it gives same or better results in STEM due to improved RL (reinforcement learning) based on some published papers.

Fun fact, the only test where it performs way worse than R1 is called SimpleQA and is a set of general knowledge questions (but not easy as someone could think from the test name).

For general knowledge you need a huge model to remember a lot of info, more than a smart one. R1 is 3 times bigger and that's why is better: Gemini 2.5 and Grok 3 seem to be also very big models, while instead OpenAI O3-mini should have similar size of this one, so about 100B / 200B parameters.

Eventine · Apr 10, 2025

Looks like Open AI is gearing up for their next set of releases, o4-mini and o3-full. I wonder if this will affect Deep Seek's R2 release. Clearly Deep Seek would prefer to release in April (since that's what they originally promised) but with an impending Open AI release, they'd want to make sure they can match or beat Open AI's new models to keep their magic reputation.

As I've said before, the race is just getting started. Now that all the Big AI labs in the West have made their play, it's time for China to answer. Also looking to see where Grok goes next, as it's been a while since Grok 3's release (in the AI race, a few months is "a while").

AI Scholar · Apr 11, 2025

Since their release, I’ve put both Gemini 2.5 pro and DeepSeek V3-03-25 through extensive testing across programming, web design, creative writing, casual chatting, and AI development tasks. Here are my thoughts:

Front-End Design:
DeepSeek takes the lead. In my experience, it generates more visually refined and aesthetically pleasing designs compared to Gemini.

Creative Writing:
Gemini delivers more consistent writing, especially in long context, but DeepSeek’s stories are more entertaining. DeepSeek wins due to having a more enjoyable writing style and less censorship.

General Chatting:
DeepSeek has a more pleasant and based conversational style. I’ve found myself defaulting to it for casual discussions.

Coding/AI dev:
Gemini tends to overengineer solutions, often producing excessively complex code when simplicity would suffice. While it clearly has stronger reasoning capabilities than DeepSeek V3, I’ve achieved similar (sometimes better) results with V3 by optimizing my prompts.

Workflow-wise, Gemini has been quite frustrating to integrate, whereas DeepSeek V3 feels smoother and more intuitive for actual development.

Gemini is undeniably much smarter in raw capability, and it should be better for many use cases, but DeepSeek V3 has become my preferred model for nearly all of my tasks, as it is more usable and enjoyable in practice.

If V3 already works for my needs so well, I can only imagine how useful DeepSeek R2 will be. I find that DeepSeek V3 is very underrated, and R2 will be a big shock to those not paying attention.

Artificial Intelligence thread

Overbom

Brigadier

european_guy

Junior Member

Eventine

Senior Member

supercat

Colonel

sunnymaxi

Colonel

luminary

Senior Member

Hyper

Junior Member

european_guy

Junior Member

Eventine

Senior Member

AI Scholar

New Member