Artificial Intelligence thread

european_guy

Junior Member
Registered Member
New Kimi report on Muon scalability. Btw, I have no idea what Muon is


Models are trained in batches. For example during pre-training, many chunks of sentences are given in one step and the model has to predict the next token for all of them. The predicted tokens are then compared to actual ground-truth tokens and the differences are computed into what is called gradients, that mainly define in which direction (like increase or decrease) should each parameter of the model (weights) change to reduce the error, i.e. to better predict the actual tokens next time you try.

Once you have the gradients (i.e. the directions) you need to change the actual weights accordingly. But how? And here comes the optimizer, and Muon is one of them.

The most straightforward way is to simply change the weight by the scaled down gradient (you don't apply it fully otherwise training become unstable). But there are more clever ways. The current standard is called
Please, Log in or Register to view URLs content!
that applies not only current batch's gradients but also a scaled down version of the previous batches, so to keep some "momentum" and smooth statistical differences across batches.

The authors claim Muon is better then AdamW, like 2 times better. It means they can reach the same results using half the training tokens. For instance a model trained with 10T tokens with Muon should yield same performance of 20T token with AdamW. This is a big claim and has to be confirmed by independent tests. Train 10T tokens into a big model can take many months of a full datacenter with thousands of GPU, so the saving could be very big if confirmed.

What assets? All of the source codes is open and shared with everyone. There are no patents, trademarks, copyrights, or trade secrets that Deepseek can use to earn monetary stream to do R&D to keep itself at forefront of AI development. So the Chinese government have to keep giving money and subsidize Deepseek to keep it competitive and then share all of its knowledge freely to everyone? Foreign AI companies can monetize Deepseek, comb through the source code and improve it all the while not sharing or contribute anything to Deepseek. I am all for open source and sharing Deeepseek for the betterment of the humanity, however I hope Deepseek can come up with a way to sustain itself without future money/subsidies from the government.

I was thinking DeepSeek business model is like Google's Android, but now I'm starting to think their business model resambles more Linux. They aim to be the Linux of AI models.

Is it a business model that makes sense? Your argumentations could have been applied verbatim to Linux 30 years ago...and history proved them wrong already.
 

Biscuits

Colonel
Registered Member
Then what assets are you talking about?

Have any Chinese companies ( like Baidu, Tencent, Alibaba..etc) starting giving money to Deepseek since they integrate it into their system? If not, then this blank cheque is from the Chinese government, right? But why would the Chinese government do that? Anything and everything that is Deepseek now and in the future will be shared freely to all American AI companies.
Where tf do you get that from? R1 is a demonstrator for the public to get up interest and convince the biggest investor in the world to choose them.

So if Lockheed Martin flies an airshow in a foreign country, it means anything and everything from that is Lockheed Martin now and in the future will be shared freely to all global arms development companies?

Where has deepseek ever revealed everything about all the projects they have? What they've done is the equivalent of live demonstrating a new type of weapon at a city fair. It says absolutely nil about what projects they have at home.
Eh, Deepseek is getting integrated into those sectors , so is he much more wealthy and powerful now? Is Deepseek getting any money from those integrations?

It is not about more money for himself, it is about sustainability. Money to further future AI development, money to hire and keep knowledge AI workers, money to buy hardware to run AI (so I don't keep getting service unavailable/down like right now), money to help developing countries to run and use Deepseek for good PR... etc.

again, it is not about patents, trademarks, copyrights, or trade secrets, it is about sustainability for Deepseek.
If he has successfully convinced the government to help him, his project is set. And based on how many places have been greenlit to use the basic deepseek model, it seems he was succesful.

A lot of AI researchers don't even believe LLM will lead to AGI. I don't have a perspective here as I'm no expert.
 

Xiongmao

Junior Member
Registered Member
Well... physical AI is not far behind...

Unitree cost only $16000 per unit, once its AI brain is fully trained, can do the work of most low wage manual labor

Musk targets his Omptimus bot at $25000 and it can cook, clean, babysit, mow lawn, wash dish, do laundry, run errands.... Gig workers are SOL.....
Musk's Optimus bot is a total scam like most of his other businesses. It can probably do one tenth of what he claims it can do now, but in five years time, perhaps. That's been his track record in all his other ventures.
 

AndrewS

Brigadier
Registered Member
I was thinking DeepSeek business model is like Google's Android, but now I'm starting to think their business model resambles more Linux. They aim to be the Linux of AI models.

Is it a business model that makes sense? Your argumentations could have been applied verbatim to Linux 30 years ago...and history proved them wrong already.

Remember that Android is technically a fork of Linux, as it was originally developed from Linux.
 

Eventine

Junior Member
Registered Member
Musk's Optimus bot is a total scam like most of his other businesses. It can probably do one tenth of what he claims it can do now, but in five years time, perhaps. That's been his track record in all his other ventures.
Eh, I wouldn't say that, Elon is relatively successful compared to most founders / CEOs simply because he has money to burn, is a ruthless sales man, and seems to know to hire/partner with the right people.

99% of start ups fail, but his success rate is more like 50%. Successes: SpaceX, Tesla, PayPal, Open AI. Failures: SolarCity, Zip2, The Boring Company, Hyperloop. Not Decided: Neuralink, X, xAI

The more appropriate description of Elon is that he creates powerful, disruptive companies that he himself often gets managed out of since his skill is in disruption, not in running a long-term successful business. I think the same will be the case for xAI.

Grok 3 is a state-of-the-art model and for a company that was founded just two years ago, it is a great achievement - it shows that he knew how to poach the right talent, setup effective leadership, and quickly create the infrastructure needed.

But it was done through brute forcing with hundreds of millions of hours on a $3 billion dollar investment in GPUs alone. Grok 3 isn't really a sustainable business - it was created solely to disrupt Open AI's business model (particularly Sam's $200/month subscriptions and Anthropic's extreme censorship), and has two features that are currently making waves across the Western market: 1) a relatively generous usage structure and 2) weak censorship.

You might be wondering why Sam was just recently talking about allowing adult content on Chat GPT? It's because of Grok 3, which is by far the most uncensored Big Model today, and reportedly has an "unhinged" mode and an up and coming voice chat mode that goes even further. Elon identified - as is typical of his style - the move that other industry players were afraid to make. He knows he can't disrupt based on efficiency & value like Chinese companies, but he can do what currently both Western and Chinese Big Models are reluctant to do in AI - adult content for all the lonely men (and women) out there.
 
Last edited:

9dashline

Captain
Registered Member
so any guesses as to what the 5 gifts Deepseek dropping (repos) next week will be?

first one should be tomorrow or migldnight
 

Sinofan

Just Hatched
Registered Member
Well, so does everyone else, including those working in OpenAI and Elon' Grok.



It would work well as long as Deepseek become/stay as the best of best AI around. However, OpenAI and Grok can always look through Deepseek's code and copy/improve whatever feature they like into their own and without sharing anything back to Deepseek. How fast do you think Deepseek can come up with new feature/methods/ideas compares to how fast OpenAI/Grok can copy and improve them?


I think we should remember what are the greatest impacts of Deepseek on a macro level as that will give us the right perspective on the big picture.

Quote of Louis Gave:
"The release of DeepSeek’s AI models has shattered three deeply held beliefs.

The first is that China can be constrained technologically. After the last few weeks, the view that “China does manufacturing, the US does digital” lies in ruins. This raises the question of how the US will respond. Will it call a truce in the tech war? Or will it crank up the sanctions?

The second shattered belief is that big tech will be able to spend its way into building ever bigger monopolies. In this view of the world, being very large is not a stumbling block to growth (as the history of capitalism teaches), but an advantage. In a world in which AI drives growth, and in which spending billions digs an AI moat, only a handful of companies would have the stakes to sit down at the big boy table. This view is now obsolete.

Finally, the release of DeepSeek’s models reminded investors that spending on semiconductors remains more cyclical than structural."
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Please, Log in or Register to view URLs content!

Alibaba is investing at least 380B RMB over the next 3 years, equivalent to what it spent in the past decade on capex for cloud and AI business. So this still seems small vs what Google and Meta spend, but part of the difference is just the lower building cost and better energy infrastructure in China.
 
Top