Artificial Intelligence thread

diadact

New Member
Registered Member
And just why do they need 100k H100 GPU cluster to stay competitive? Alibaba just trained its latest Qwen 2.5 on 18 trillion tokens
Please, Log in or Register to view URLs content!

It got released just a few months after Qwen-2.0.
If they had a larger cluster with 20K more GPUs then they would have crushed OpenAI, Anthropic, etc
There are 2 constraints in AI rn, AI GPUs & human-labeled data for RLHF(In the future RLHF will act as an aid for RLAIF)

How much non duplicate tokens are out there globally that you can use? How much larger do their cluster really need to get?
Every AI firm(big firms in US, cohere, Mistral, and every AI firm in China) is using synthetic data generation to generate more tokens for training
You just need to make sure that the synthetic data is diverse & has high entropy
Qwen hired lots of human data annotators for RLHF
OpenAI gives you free access to GPT 3.5/GPT 4o mini so that they can use the user interaction with the model for training their models
They use their users for synthetic data generation
Data wall doesn't exist anymore
Chinese firms are trying to compensate for their lack of GPUs with better quality data
OpenAI & Anthropic will have huge amounts of GPUs along with quality data
We haven't even seen the full power of o1 yet
They just forced the model to think step by step at inference and trained it on RL fine tuned datasets
No MCTS or tree search was involved during inference(they don't have enough compute to serve MCTS to millions of users)
AI has become a compute/GPU game now
The more compute you have the more you can experiment, the bigger models you can make, the more inference time compute you can use


1726808063860.png
 

Eventine

Junior Member
Registered Member
Seems like a fast follower mentality, which is probably expected out of profit guided companies like Alibaba. What sets Open AI above the competition is their mission driven culture - from talking to people inside the company, money is not the main motivation, rather it’s the deeply held belief that they’re on the edge of revolutionary breakthroughs in Artificial General Intelligence, which will fundamentally change the course of human history.

Rumors are, many of the talent in Open AI actually took demotions to join the company. Heard of people who were former directors & VPs at Google, Facebook, etc. joining as just regular managers & employees. The passion & sense of mission from all indications is off the charts. That seems to be a missing quality among their competitors who are looking more to just keep up / cash in.

We can fault the West for many things but when it comes to single minded, obsessive compulsive, mission driven mentality, it feels they still have the edge over Chinese technology leaders who tend to be more money motivated.
 

diadact

New Member
Registered Member
Seems like a fast follower mentality, which is probably expected out of profit guided companies like Alibaba. What sets Open AI above the competition is their mission driven culture - from talking to people inside the company, money is not the main motivation, rather it’s the deeply held belief that they’re on the edge of revolutionary breakthroughs in Artificial General Intelligence, which will fundamentally change the course of human history.
Considering the performance of Qwen 2.5, I would disagree with that statement
If they had more compute they would have crushed Anthropic, Deepmind/google, OpenAI
OpenAI can behave like that because Microsoft & other investors are willing to bank roll them

Rumors are, many of the talent in Open AI actually took demotions to join the company. Heard of people who were former directors & VPs at Google, Facebook, etc. joining as just regular managers & employees. The passion & sense of mission from all indications is off the charts. That seems to be a missing quality among their competitors who are looking more to just keep up / cash in.

We can fault the West for many things but when it comes to single minded, obsessive compulsive, mission driven mentality, it feels they still have the edge over Chinese technology leaders who tend to be more money motivated.
Deepseek & Zhipu from China has this mentality
Deepseek bankrolls its AI development through High Flyer's(parent company) quant trading firm
If AGI/ASI development requires trillions or billions of dollars then you will have to generate revenue

Chinese government will have to get involved at some point in the future for funding just like US government will get involved

I personally think that if China wants to make better models than US big labs then they should just make a 100K H100 class GPU cluster for Deepseek and let them do their job
 

Eventine

Junior Member
Registered Member
I’m not questioning the practical performance of Chinese models, but if you read the Qwen 2.5 researchers’ statements above you can tell they’re not doing any ground breaking research but just refining existing practices. The final sentence about being shocked by Open AI’s breakthrough in chain of thought reasoning is especially telling. From decades in the industry and academia, I'm painfully aware of East Asia's tendency to focus on practical, incremental improvements over basic research, and this is just another instance of it.

Often I wonder how much of this is due to culture. People in the West are far more intense and psychotic about their beliefs, but that same psychosis seems to drive them to pursue subjects with single-minded devotion, even if it ruins them.
 
Last edited:

diadact

New Member
Registered Member
Qwen 2.5 researchers’ statements above you can tell they’re not doing any ground breaking research but just refining existing practices.
Nobody has released something magical in AI rn(They may be doing something groundbreaking internally but it hasn't been released)
OpenAI o1 model is CoT with some RL fine tuning(Every serious lab in the world(US or China) is doing something similar, other models with this ability will be released publicly in weeks or months)

OpenAI has a paper-thin moat otherwise they wouldn't hide the reasoning tokens of o1 from being displayed to their users

The final sentence about being shocked by Open AI’s breakthrough in chain of thought reasoning is especially telling
Junyang was being humble here
Alibaba has released papers related to MCTS & CoT
The next release will have this
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
If they had a larger cluster with 20K more GPUs then they would have crushed OpenAI, Anthropic, etc
There are 2 constraints in AI rn, AI GPUs & human-labeled data for RLHF(In the future RLHF will act as an aid for RLAIF)

They probably have far more GPUs than that. Remember, the got 50k H800 delivery in 2023 alone. And this doesn't count all the GPUs before that and the smuggling into China of H100 or the large Ascend purchases they've put in.

Every AI firm(big firms in US, cohere, Mistral, and every AI firm in China) is using synthetic data generation to generate more tokens for training
You just need to make sure that the synthetic data is diverse & has high entropy
Qwen hired lots of human data annotators for RLHF
OpenAI gives you free access to GPT 3.5/GPT 4o mini so that they can use the user interaction with the model for training their models
They use their users for synthetic data generation
Data wall doesn't exist anymore
Chinese firms are trying to compensate for their lack of GPUs with better quality data
OpenAI & Anthropic will have huge amounts of GPUs along with quality data
We haven't even seen the full power of o1 yet
They just forced the model to think step by step at inference and trained it on RL fine tuned datasets
No MCTS or tree search was involved during inference(they don't have enough compute to serve MCTS to millions of users)
AI has become a compute/GPU game now
The more compute you have the more you can experiment, the bigger models you can make, the more inference time compute you can use
Yes, I use GPT-4o for work for 10 hours yesterday. In fact, it's my job.
And I want to strangle GPT-4o, because it cannot follow basic directions.
I can tell you that anyone that thinks AI is sentient hasn't been using GPT-4o for basic tasks.

I'm personally quite dubious of synthetic data.

Right now, this entire industry is so much less sexy than people think. It's all about fixing human data to a more consumable format and get rid of low quality stuff so that your training and contexting can gain "more intelligence" and such.

Now, I think using AI to sort out existing bad dataset and make them better makes a lot of sense. I'm not convinced how data generated from AI will provide that much additional intelligence.

From where I can see, all that does is harden the weights of parameters generated without it.
 

diadact

New Member
Registered Member
And I want to strangle GPT-4o, because it cannot follow basic directions.
Have u tested o1?
Bear in mind that o1 preview is elementary
No tree search is happening at inference, they are just using CoT
Tree search will require lots of compute to serve millions of users which they don't have rn
CoT & MCTS + agents will solve the direction following problem
OpenAI will release Orion in Q4 2024 or Q1 2025
Claude 3.5 Opus, Gemini 2, Claude 4 will have agentic capabilities
AI has become a compute game
The one who has more compute will win
I'm personally quite dubious of synthetic data
Sensetime v5, Claude 3, Deepseek v2/v2.5, GPT 4o, Qwen 2/2.5, gemini pro 1.5 were trained on synthetic data
If synthetic data was not working then we would have seen model collapses due to low entropy
Every major lab knows how to maintain data diversity and high entropy while using synthetic data
Qwen trained on 18T tokens had lots of high-quality synthetic tokens
It's all about fixing human data to a more consumable format and get rid of low quality stuff
All the top labs have solved this issue be it China or US or France
I'm not convinced how data generated from AI will provide that much additional intelligence.
This is where human annotated & human generated data comes in
ScaleAI does this
 

9dashline

Captain
Registered Member
US AI are mostly scams, openai is CIA mossad front..

o1 is nothing special, after testing it some more, its not any smarter than 4o. just checks its work and step processes, which chained llm can do the same

kling is already king, where is SORA?
and while meta tried to rugpull openai with llama 3 as open weights, Qwen 2.5 already bested Llama and rug pulled everyone

reflect on this, pardon the pun

 
Last edited:

european_guy

Junior Member
Registered Member
And just why do they need 100k H100 GPU cluster to stay competitive? Alibaba just trained its latest Qwen 2.5 on 18 trillion tokens
Please, Log in or Register to view URLs content!

It got released just a few months after Qwen-2.0.

What is most impressive to me is that with a 70B model trained on 18T tokens they reached the performance of the 405B Llama model trained on 15T tokens.

For (dense) Transformer models computation per token is proportional to the number of parameters, this means that Qwen 70B reached the same performance of Llama 405B with a 5 times smaller computing budget!

This is even more impressive considering that Llama 3 is only few months old and that the Meta team behind Llama is world class, they are state-of-the-art.

Since about 2 years there is a new trend toward better training techniques applied to smaller models: this indirectly greatly helps China to workaround GPU limitations for the time being.

The model range 7B-70B is here to stay, these models will be the workhorses of future applications, even more when the new breed of reasoning models (that is actually 90% new training techniques) will spread, so that even a small 7B model will gain "reasoning" capabilities.

BTW in the last 18 months the papers on eliciting reasoning in LLM sprung up like mushrooms. Many
Please, Log in or Register to view URLs content!
on internet say that OpenAI o1 is based
Please, Log in or Register to view URLs content!
...the other big players are months away from OpenAI, not years away.
 
Last edited:

9dashline

Captain
Registered Member
What is most impressive to me is that with a 70B model trained on 18T tokens they reached the performance of the 405B Llama model trained on 15T tokens.

For (dense) Transformer models computation per token is proportional to the number of parameters, this means that Qwen 70B reached the same performance of Llama 405B with a 5 times smaller computing budget!

This is even more impressive considering that Llama 3 is only few months old and that the Meta team behind Llama is world class, they are state-of-the-art.

Since about 2 years there is a new trend toward better training techniques applied to smaller models: this indirectly greatly helps China to workaround GPU limitations for the time being.

The model range 7B-70B is here to stay, these models will be the workhorses of future applications, even more when the new breed of reasoning models (that is actually 90% new training techniques) will spread, so that even a small 7B model will gain "reasoning" capabilities.

BTW in the last 18 months the papers on eliciting reasoning in LLM sprung up like mushrooms. Many
Please, Log in or Register to view URLs content!
on internet say that OpenAI o1 is based
Please, Log in or Register to view URLs content!
...the other big players are months away from OpenAI, not years away.
thats for sure the right paper, thanks. its even titled Qstar lol
 
Top