Artificial Intelligence thread

diadact

New Member
Registered Member
BTW in the last 18 months the papers on eliciting reasoning in LLM sprung up like mushrooms. Many
Please, Log in or Register to view URLs content!
on internet say that OpenAI o1 is based
Please, Log in or Register to view URLs content!
...the other big players are months away from OpenAI, not years away
yeah it is based on the STAR method
Many of the authors of that paper are at XAI(Elon's company) now
What is most impressive to me is that with a 70B model trained on 18T tokens they reached the performance of the 405B Llama model trained on 15T tokens.
Qwen 2.5 has better performance due to more data
better filtered, bigger corpus, and more extensive RLHF data annotation
If only they had a bigger cluster and more compute they would have crushed 4o, sonnet 3.5 as well
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Have u tested o1?
Bear in mind that o1 preview is elementary
No tree search is happening at inference, they are just using CoT
Tree search will require lots of compute to serve millions of users which they don't have rn
CoT & MCTS + agents will solve the direction following problem
OpenAI will release Orion in Q4 2024 or Q1 2025
Claude 3.5 Opus, Gemini 2, Claude 4 will have agentic capabilities
AI has become a compute game
The one who has more compute will win
so we are constantly trying out different LLMs for our applications.
We've found OpenAI stuff to be the best thus far.
Again, things are improving. Some things that LLMs didn't do as well 6 months ago are now working much better.

Everyone makes a claim about how the latest LLM is the greatest ever and is reaching AGI.
Whatever
As I said, I test the latest available GPT-4o models to automate stuff and want to strangle it.

Sensetime v5, Claude 3, Deepseek v2/v2.5, GPT 4o, Qwen 2/2.5, gemini pro 1.5 were trained on synthetic data
If synthetic data was not working then we would have seen model collapses due to low entropy
Every major lab knows how to maintain data diversity and high entropy while using synthetic data
Qwen trained on 18T tokens had lots of high-quality synthetic tokens

All the top labs have solved this issue be it China or US or France

This is where human annotated & human generated data comes in
ScaleAI does this
sure, they all have some synthetic data, which in my mind get rid of some of the lower quality data out there and clean those up a little bit.

But there is a difference between using half real data and half synthetic data.

Vs 10% real data and 90% data generated from an older AI.

Fundamentally, AI models we have right now is just a prediction model of the next token. If 90% tokens are generated using an older prediction model, how is the new model going to be significantly better?
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
What is most impressive to me is that with a 70B model trained on 18T tokens they reached the performance of the 405B Llama model trained on 15T tokens.

For (dense) Transformer models computation per token is proportional to the number of parameters, this means that Qwen 70B reached the same performance of Llama 405B with a 5 times smaller computing budget!

This is even more impressive considering that Llama 3 is only few months old and that the Meta team behind Llama is world class, they are state-of-the-art.

Since about 2 years there is a new trend toward better training techniques applied to smaller models: this indirectly greatly helps China to workaround GPU limitations for the time being.

The model range 7B-70B is here to stay, these models will be the workhorses of future applications, even more when the new breed of reasoning models (that is actually 90% new training techniques) will spread, so that even a small 7B model will gain "reasoning" capabilities.

BTW in the last 18 months the papers on eliciting reasoning in LLM sprung up like mushrooms. Many
Please, Log in or Register to view URLs content!
on internet say that OpenAI o1 is based
Please, Log in or Register to view URLs content!
...the other big players are months away from OpenAI, not years away.

See this chart for 10 EFLOPS data center and computation time.

MT-TrainingTimeVsParameterToken.jpeg

so for 15 T tokens. Difference in computation between 300B parameter and 70B parameter in training time is 4 fold. So yeah, you are probably right on the training resource difference.

Which btw indicates one thing. Once you have certain number of tokens, there is like an optimal number of parameters. Having too many parameters don't make model significantly better.

Someone told me a while back that the ideal ratio of tokens to parameters is 20:1. But it seems like the ratio should be larger than that, since for Qwen-2.5, it was around 200:1.
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Alright, so I personally tried o1-preview a little bit this morning. The results are better. However, it is so slow. You guys can try it on the website. Cannot believe how much longer it takes vs gpt-4o. On top of that, it doesn't support max tokens.
 

siegecrossbow

General
Staff member
Super Moderator
Please, Log in or Register to view URLs content!

hilarious stuff where we have an ancient agriculture Large model being developed. I didn't know you need AI to study agriculture history.

The end goal may not be pertaining to ancient agriculture but use a novel form of language for LLM prompting. Ideally a well written chatbot prompt needs to be concise but information dense and precise in meaning. Which language actually qualifies for the above criteria? 文言文,or Classical Chinese language, does! This may end up having significant impact on model performance on top of providing liberal arts majors a stable career prospect.
 

Engineer

Major
The end goal may not be pertaining to ancient agriculture but use a novel form of language for LLM prompting. Ideally a well written chatbot prompt needs to be concise but information dense and precise in meaning. Which language actually qualifies for the above criteria? 文言文,or Classical Chinese language, does! This may end up having significant impact on model performance on top of providing liberal arts majors a stable career prospect.
Not the first time Classical Chinese is used in computational context.
Please, Log in or Register to view URLs content!
a-rendering-of-a-program-written-in-wenyan-lang-to-draw-the-mandelbrot-set.jpg
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
good interview here with Junyang Lin, the guy in charge of Qwen project on Github.

Please, Log in or Register to view URLs content!

basically, I think it's worth listening to how they built the models and the considerations they took. How they took community feedbacks to improve the models.

It seems to me that validating models just take so much effort and people really underestimate how much time is required to validate the models.

He also mentioned how synthetic data is used. Basically, it's only really used in the math and coding models. For example, previous people would create math problems and they would use Qwen-2 solutions and then feed that in as data into generating Qwen-2.5 model.

But to me, in that case, you'd still need to validate data, because there is just so much hallucination in LLMs still.

That to me is a huge problem with synthetic data or adding more data in general. How do you classify them properly. How do you make sure the data is meaningful. We are still at the moment where hyperscalers can find more quality data and generate better models based on that. What happens when we get to the point where we can no longer get enough quality data?


On o1-preview, the reasoning part of it is much better vs gpt-4o. Definitely not at human level though. I want to get better at prompting it at this point. But the problem is that run time is so much worse than gpt-4o. OpenAI needs to stick in more GPUs if it wants to make more money off users. It's already the most expensive API out there and it's certainly not the fastest.
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Updates from Huawei connect event.

Please, Log in or Register to view URLs content!

intend to spend 1B RMB a year to train kunpeng and Ascend talent.

Also developer Euler, CANN and Mindspore.. The basic tech behind Huawei's AI push.

They are gifting 100k kunpeng and ascend development board/toolkit for various schools and universities.
 
Top