"DeepSeek’s Chatbot Was Being Used By Pentagon Employees For At Least Two Days Before The Service Was Pulled From The Network; Early Version Has Been Downloaded Since Fall 2024[...]"
Link:
Bloomberg link:
Link:
Bloomberg link:
Deepseek is literally everywhere now... I never seen any AI model get adopted this quick. This is the real AI Diffusion...
but where does the Nvidia H200 token generation speed comes from?Cerebras is not running the full version. They are running the 70b distilled version, which you can run on many other lower powered chips.
Nvidia is running the full 671B.
so now he says it's 20000? I mean I can believe on the 10000 H800, but 10000 H100 seems harder to believe.Dylan Patel claim that Deepseek has 10000 H800 and 10000 H100,and the total server CapEx for DeepSeek is $1.3B
I don't think DeepSeek is thinking about things geopolitically. It's a small new player looking to make a splash. And the best way to do this is by sharing its research. And now, it is famous.To release R1 as open source with also a detailed technical paper is 3D chess geopolitical genius move.
We don't know if they did it purposely, but now the effect is clear to anyone who wants to see it (not everyone wants to see btw).
We were going toward a dangerous oligopoly of tier-1 AI providers: closed, US big companies, supported by a clear geopolitical agenda of world-domination by monopolizing the key enablers of our economic future (f.i. see restriction of NVIDIA chips outside of US as a stark example, it is not only anti-china, it is anti-everybody).
Never like today, at the dawn of a this new and powerful technological revolution, the world needs a counterbalance to US hegemony, if we don't want to end up split in good and bad guys, where the bad/good guy patent is given by a single actor, according to its sole interests.
As an European I have to admit that today only China can be that counterbalance.
but where does the Nvidia H200 token generation speed comes from?
Combined with the software optimizations available in the NVIDIA NIM microservice, a single server with eight H200 GPUs connected using NVLink and NVLink Switch can run the full, 671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second.
Not necessarily. By driving down the cost of compute and all models, it becomes cheaper for them too. And the model is only a little bit of the advantage. Its how you combine the model with other things (that nobody knows about) where the real advantage lies.It's not creating the model to create competitive advantage that's interesting
It's releasing the model so everyone, including other hedge funds can share in your advantage, which nullify your advantage, that's interesting.
Yes, They didn't release their costum training stack so that is their advantage but they tell researchers how replicate their approach.Not necessarily. By driving down the cost of compute and all models, it becomes cheaper for them too. And the model is only a little bit of the advantage. Its how you combine the model with other things (that nobody knows about) where the real advantage lies.
Ironically, it's what killed Imperial China, after Europe got the nearly 100% free cash injection from a whole continent of South American idiots, they could pay for so many people to study and become literate. While Europe was making knowledge available to millions, either due to practical or institutional reasons, the top knowledge in Imperial China was only avaliable to a handful of people.View attachment 144722
-Meta Yann LeCun is right, open source at the end will yield better results.
-The Closed Source-Corporate-NatSec Filled-Crypto Bro approach to AI is making the US to bleed AI talent like crazy, for example only a few people know how o1 and o3 works compared to the millions of researchers with deekseek.
-China more open and academic approach to AI is creating a lot of talent, by companies like Alibaba and Tencent releasing their weights, code and papers they teach other academics how these thing works and those researchers can be hire in the future. Pretty much like EVs, China is creating an ecosystem rather than a few corpos. The Chinese government should keep encouraging Chinese companies to keep this openness as much as they can.
-In other hand US universities are increasingly reliant on the research and models that are coming from China to do their research, not a bad thing, but if the NatSec stooges decide to ban Chinese AI models in the US, these universities are doomed if the crypto bros that are plaguing silicon valley do not open their models and research.
-In the best best best case for the stooges, Chips controls MAYBE buy them a few years but that doesn't even account for China innovations in efficiency, for example deekseek (the cope is still hard). But once an industry in disarray, export controls pushed China semiconductor industry to develop really fast, like if everything was there from the start but needed an incentive to organize itself. Again what China is brewing it will make the DeepSeek Swan event looks like nothing.
I think its fair to say only analysis by people who can point to their own prediction before the fact has any credibility.Dylan Patel claim that Deepseek has 10000 H800 and 10000 H100,and the total server CapEx for DeepSeek is $1.3B