Artificial Intelligence thread

european_guy

Junior Member
Registered Member
New Kimi report on Muon scalability. Btw, I have no idea what Muon is


Models are trained in batches. For example during pre-training, many chunks of sentences are given in one step and the model has to predict the next token for all of them. The predicted tokens are then compared to actual ground-truth tokens and the differences are computed into what is called gradients, that mainly define in which direction (like increase or decrease) should each parameter of the model (weights) change to reduce the error, i.e. to better predict the actual tokens next time you try.

Once you have the gradients (i.e. the directions) you need to change the actual weights accordingly. But how? And here comes the optimizer, and Muon is one of them.

The most straightforward way is to simply change the weight by the scaled down gradient (you don't apply it fully otherwise training become unstable). But there are more clever ways. The current standard is called
Please, Log in or Register to view URLs content!
that applies not only current batch's gradients but also a scaled down version of the previous batches, so to keep some "momentum" and smooth statistical differences across batches.

The authors claim Muon is better then AdamW, like 2 times better. It means they can reach the same results using half the training tokens. For instance a model trained with 10T tokens with Muon should yield same performance of 20T token with AdamW. This is a big claim and has to be confirmed by independent tests. Train 10T tokens into a big model can take many months of a full datacenter with thousands of GPU, so the saving could be very big if confirmed.

What assets? All of the source codes is open and shared with everyone. There are no patents, trademarks, copyrights, or trade secrets that Deepseek can use to earn monetary stream to do R&D to keep itself at forefront of AI development. So the Chinese government have to keep giving money and subsidize Deepseek to keep it competitive and then share all of its knowledge freely to everyone? Foreign AI companies can monetize Deepseek, comb through the source code and improve it all the while not sharing or contribute anything to Deepseek. I am all for open source and sharing Deeepseek for the betterment of the humanity, however I hope Deepseek can come up with a way to sustain itself without future money/subsidies from the government.

I was thinking DeepSeek business model is like Google's Android, but now I'm starting to think their business model resambles more Linux. They aim to be the Linux of AI models.

Is it a business model that makes sense? Your argumentations could have been applied verbatim to Linux 30 years ago...and history proved them wrong already.
 

Biscuits

Colonel
Registered Member
Then what assets are you talking about?

Have any Chinese companies ( like Baidu, Tencent, Alibaba..etc) starting giving money to Deepseek since they integrate it into their system? If not, then this blank cheque is from the Chinese government, right? But why would the Chinese government do that? Anything and everything that is Deepseek now and in the future will be shared freely to all American AI companies.
Where tf do you get that from? R1 is a demonstrator for the public to get up interest and convince the biggest investor in the world to choose them.

So if Lockheed Martin flies an airshow in a foreign country, it means anything and everything from that is Lockheed Martin now and in the future will be shared freely to all global arms development companies?

Where has deepseek ever revealed everything about all the projects they have? What they've done is the equivalent of live demonstrating a new type of weapon at a city fair. It says absolutely nil about what projects they have at home.
Eh, Deepseek is getting integrated into those sectors , so is he much more wealthy and powerful now? Is Deepseek getting any money from those integrations?

It is not about more money for himself, it is about sustainability. Money to further future AI development, money to hire and keep knowledge AI workers, money to buy hardware to run AI (so I don't keep getting service unavailable/down like right now), money to help developing countries to run and use Deepseek for good PR... etc.

again, it is not about patents, trademarks, copyrights, or trade secrets, it is about sustainability for Deepseek.
If he has successfully convinced the government to help him, his project is set. And based on how many places have been greenlit to use the basic deepseek model, it seems he was succesful.

A lot of AI researchers don't even believe LLM will lead to AGI. I don't have a perspective here as I'm no expert.
 

Xiongmao

Junior Member
Registered Member
Well... physical AI is not far behind...

Unitree cost only $16000 per unit, once its AI brain is fully trained, can do the work of most low wage manual labor

Musk targets his Omptimus bot at $25000 and it can cook, clean, babysit, mow lawn, wash dish, do laundry, run errands.... Gig workers are SOL.....
Musk's Optimus bot is a total scam like most of his other businesses. It can probably do one tenth of what he claims it can do now, but in five years time, perhaps. That's been his track record in all his other ventures.
 

AndrewS

Brigadier
Registered Member
I was thinking DeepSeek business model is like Google's Android, but now I'm starting to think their business model resambles more Linux. They aim to be the Linux of AI models.

Is it a business model that makes sense? Your argumentations could have been applied verbatim to Linux 30 years ago...and history proved them wrong already.

Remember that Android is technically a fork of Linux, as it was originally developed from Linux.
 
Top