If they had a larger cluster with 20K more GPUs then they would have crushed OpenAI, Anthropic, etcAnd just why do they need 100k H100 GPU cluster to stay competitive? Alibaba just trained its latest Qwen 2.5 on 18 trillion tokens
It got released just a few months after Qwen-2.0.
There are 2 constraints in AI rn, AI GPUs & human-labeled data for RLHF(In the future RLHF will act as an aid for RLAIF)
Every AI firm(big firms in US, cohere, Mistral, and every AI firm in China) is using synthetic data generation to generate more tokens for trainingHow much non duplicate tokens are out there globally that you can use? How much larger do their cluster really need to get?
You just need to make sure that the synthetic data is diverse & has high entropy
Qwen hired lots of human data annotators for RLHF
OpenAI gives you free access to GPT 3.5/GPT 4o mini so that they can use the user interaction with the model for training their models
They use their users for synthetic data generation
Data wall doesn't exist anymore
Chinese firms are trying to compensate for their lack of GPUs with better quality data
OpenAI & Anthropic will have huge amounts of GPUs along with quality data
We haven't even seen the full power of o1 yet
They just forced the model to think step by step at inference and trained it on RL fine tuned datasets
No MCTS or tree search was involved during inference(they don't have enough compute to serve MCTS to millions of users)
AI has become a compute/GPU game now
The more compute you have the more you can experiment, the bigger models you can make, the more inference time compute you can use