Artificial Intelligence thread

GulfLander · Jan 31, 2025

"DeepSeek’s Chatbot Was Being Used By Pentagon Employees For At Least Two Days Before The Service Was Pulled From The Network; Early Version Has Been Downloaded Since Fall 2024[...]"
Link:

Please, Log in or Register to view URLs content!

Bloomberg link:

Please, Log in or Register to view URLs content!

tonyget · Jan 31, 2025

Dylan Patel claim that Deepseek has 10000 H800 and 10000 H100，and the total server CapEx for DeepSeek is $1.3B

Please, Log in or Register to view URLs content!

european_guy · Jan 31, 2025

9dashline said:
Please, Log in or Register to view URLs content!

Deepseek is literally everywhere now... I never seen any AI model get adopted this quick. This is the real AI Diffusion...

To release R1 as open source with also a detailed technical paper is 3D chess geopolitical genius move.

We don't know if they did it purposely, but now the effect is clear to anyone who wants to see it (not everyone wants to see btw).

We were going toward a dangerous oligopoly of tier-1 AI providers: closed, US big companies, supported by a clear geopolitical agenda of world-domination by monopolizing the key enablers of our economic future (f.i. see restriction of NVIDIA chips outside of US as a stark example, it is not only anti-china, it is anti-everybody).

Never like today, at the dawn of a this new and powerful technological revolution, the world needs a counterbalance to US hegemony, if we don't want to end up split in good and bad guys, where the bad/good guy patent is given by a single actor, according to its sole interests.

As an European I have to admit that today only China can be that counterbalance.

tphuang · Jan 31, 2025

huemens said:
Cerebras is not running the full version. They are running the 70b distilled version, which you can run on many other lower powered chips.
Nvidia is running the full 671B.

but where does the Nvidia H200 token generation speed comes from?

The Groq cloud server token generation speed is quite fast even accounting for using 70B distilled version

tonyget said:
Dylan Patel claim that Deepseek has 10000 H800 and 10000 H100，and the total server CapEx for DeepSeek is $1.3B

Please, Log in or Register to view URLs content!

so now he says it's 20000? I mean I can believe on the 10000 H800, but 10000 H100 seems harder to believe.

european_guy said:
To release R1 as open source with also a detailed technical paper is 3D chess geopolitical genius move.

We don't know if they did it purposely, but now the effect is clear to anyone who wants to see it (not everyone wants to see btw).

We were going toward a dangerous oligopoly of tier-1 AI providers: closed, US big companies, supported by a clear geopolitical agenda of world-domination by monopolizing the key enablers of our economic future (f.i. see restriction of NVIDIA chips outside of US as a stark example, it is not only anti-china, it is anti-everybody).

Never like today, at the dawn of a this new and powerful technological revolution, the world needs a counterbalance to US hegemony, if we don't want to end up split in good and bad guys, where the bad/good guy patent is given by a single actor, according to its sole interests.

As an European I have to admit that today only China can be that counterbalance.

I don't think DeepSeek is thinking about things geopolitically. It's a small new player looking to make a splash. And the best way to do this is by sharing its research. And now, it is famous.

huemens · Jan 31, 2025

tphuang said:
but where does the Nvidia H200 token generation speed comes from?

It looks like the Nvidia number is not for a single chip but an 8-card Server. From Nvidia:

Combined with the software optimizations available in the NVIDIA NIM microservice, a single server with eight H200 GPUs connected using NVLink and NVLink Switch can run the full, 671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second.

Cerebras is probably getting 1500 t/s from 70b on a single wafer-scale chip.

tokenanalyst · Jan 31, 2025

-Meta Yann LeCun is right, open source at the end will yield better results.

-The Closed Source-Corporate-NatSec Filled-Crypto Bro approach to AI is making the US to bleed AI talent like crazy, for example only a few people know how o1 and o3 works compared to the millions of researchers with deekseek.

-China more open and academic approach to AI is creating a lot of talent, by companies like Alibaba and Tencent releasing their weights, code and papers they teach other academics how these thing works and those researchers can be hire in the future. Pretty much like EVs, China is creating an ecosystem rather than a few corpos. The Chinese government should keep encouraging Chinese companies to keep this openness as much as they can.

-In other hand US universities are increasingly reliant on the research and models that are coming from China to do their research, not a bad thing, but if the NatSec stooges decide to ban Chinese AI models in the US, these universities are doomed if the crypto bros that are plaguing silicon valley do not open their models and research.

-In the best best best case for the stooges, Chips controls MAYBE buy them a few years but that doesn't even account for China innovations in efficiency, for example deekseek (the cope is still hard). But once an industry in disarray, export controls pushed China semiconductor industry to develop really fast, like if everything was there from the start but needed an incentive to organize itself. Again what China is brewing it will make the DeepSeek Swan event looks like nothing.

https://twitter.com/i/web/status/1884708204321661206

subotai1 · Jan 31, 2025

iewgnem said:
It's not creating the model to create competitive advantage that's interesting
It's releasing the model so everyone, including other hedge funds can share in your advantage, which nullify your advantage, that's interesting.

Not necessarily. By driving down the cost of compute and all models, it becomes cheaper for them too. And the model is only a little bit of the advantage. Its how you combine the model with other things (that nobody knows about) where the real advantage lies.

tokenanalyst · Jan 31, 2025

subotai1 said:
Not necessarily. By driving down the cost of compute and all models, it becomes cheaper for them too. And the model is only a little bit of the advantage. Its how you combine the model with other things (that nobody knows about) where the real advantage lies.

Yes, They didn't release their costum training stack so that is their advantage but they tell researchers how replicate their approach.

Biscuits · Jan 31, 2025

tokenanalyst said:
View attachment 144722

-Meta Yann LeCun is right, open source at the end will yield better results.

-The Closed Source-Corporate-NatSec Filled-Crypto Bro approach to AI is making the US to bleed AI talent like crazy, for example only a few people know how o1 and o3 works compared to the millions of researchers with deekseek.

-China more open and academic approach to AI is creating a lot of talent, by companies like Alibaba and Tencent releasing their weights, code and papers they teach other academics how these thing works and those researchers can be hire in the future. Pretty much like EVs, China is creating an ecosystem rather than a few corpos. The Chinese government should keep encouraging Chinese companies to keep this openness as much as they can.

-In other hand US universities are increasingly reliant on the research and models that are coming from China to do their research, not a bad thing, but if the NatSec stooges decide to ban Chinese AI models in the US, these universities are doomed if the crypto bros that are plaguing silicon valley do not open their models and research.

-In the best best best case for the stooges, Chips controls MAYBE buy them a few years but that doesn't even account for China innovations in efficiency, for example deekseek (the cope is still hard). But once an industry in disarray, export controls pushed China semiconductor industry to develop really fast, like if everything was there from the start but needed an incentive to organize itself. Again what China is brewing it will make the DeepSeek Swan event looks like nothing.

https://twitter.com/i/web/status/1884708204321661206

Ironically, it's what killed Imperial China, after Europe got the nearly 100% free cash injection from a whole continent of South American idiots, they could pay for so many people to study and become literate. While Europe was making knowledge available to millions, either due to practical or institutional reasons, the top knowledge in Imperial China was only avaliable to a handful of people.

Statistically, the side with millions of people able to give their input on a subject will always win over the one with just a few 1000 or even just 100s.

That's also a good explanation for why US procurement is so ridiculously ineffective. It's not just a corruption thing, it's also about too many people hoarding their knowledge because they fear sharing their discovery will make them redundant.

US' new weapons are all "relics" (in the sense they are world class, but only avaliable in painstakingly few numbers and can only be maintained by supply lines that barely exist). With the exception of F-35, but that program basically broke the back of their whole military budget, sucked the funding from god knows how much else.

A near-China sized economy who's putting 4%+ gdp in military should not struggle with fielding at least two 6th gen programs.

iewgnem · Jan 31, 2025

tonyget said:
Dylan Patel claim that Deepseek has 10000 H800 and 10000 H100，and the total server CapEx for DeepSeek is $1.3B

Please, Log in or Register to view URLs content!

I think its fair to say only analysis by people who can point to their own prediction before the fact has any credibility.
All after-the-fact analysis falsely assumes complete information right after R1 demonstrated they did not have complete information.

Artificial Intelligence thread

GulfLander

Brigadier

tonyget

Senior Member

european_guy

Junior Member

tphuang

General

huemens

Junior Member

tokenanalyst

Lieutenant General

subotai1

Junior Member

tokenanalyst

Lieutenant General

Biscuits

Colonel

iewgnem

Captain