Artificial Intelligence thread

european_guy · Jan 29, 2025

tphuang said:
https://twitter.com/i/web/status/1884370877242134807

My commentary on Qwen-2.5-Max. I think they got a little unlucky here with DeepSeek beating them to the punch and stealing everyone's heart away. They may have a better model now than V3, but it came a month too late and without a comparable reasoning model that is super addictive.

This model has the potential to become world-class.

Please, Log in or Register to view URLs content!

It has been pre-trained on 20T tokens!!!

It is 20000 billions of tokens. For reference LLama 3 405B was trained on 15T tokens and was a record at that time. Maybe GPT-o1 and Google Gemini are also trained on similar token budgets but they don't disclose this important info.

After the pre-training phase you get what is called the base model. And their base model is already world class as we can see in this table of base models comparison (OpenAI, Anthropic and Google don't allow access to their base models). In particular it is stronger than DeepSeek V3 base, the base model of R1.

Pre-training is by far the most resource consuming phase of the training and it is when the model accumulates knowledge of the world.

Following phase called fine-tuning or post-training is much lighter from a resource budget POW, but is the key one to make all this knowledge come to fruition, to make all the good features of the model to emerge (including reasoning).

A strong base model will always develop, after post-training, into a strong instruct (i.e. finished) model.

So, now that the recipe for reasoning is out in the open (btw the recipe involves post-training, not pre-training) we can be very confident that this base model will evolve into a top class model better than R1 within 2/3 months.

Our base models have demonstrated significant advantages across most benchmarks, and we are optimistic that advancements in post-training techniques will elevate the next version of Qwen2.5-Max to new heights.

GulfLander · Jan 29, 2025

Hailuo AI
"Introducing Hailuo T2V-01-Director Model: Control Your Camera Like a Pro!"

https://twitter.com/i/web/status/1884176446702428568

...

https://twitter.com/i/web/status/1881641699761668406

...
Hunyuan introduced Blender plugin for Hunyun3D 2.0

https://twitter.com/i/web/status/1884043264690643237

european_guy · Jan 29, 2025

This is the chief CTO of OpenAI (after Ilya Sutskever and many others were kicked out last year)

https://twitter.com/i/web/status/1884303237186216272

Now he reveals that they found the secret RL recipe to foster the emergence of thinking before DS. What DeepSeek found and published, was at the base of o1.

So, you can chose to keep confidential a very important piece of knowledge that otherwise would help the entire AI world to progress...it is a business decision, maybe not something people will applaud you, but understandable from a business POW.

But why disclose it now? Why tell everybody "I found it before! I just didn't tell anybody". Why speak only now that DS already published it openly?

This is very embarrassing and IMHO very sad from them: better to shut up and reflect on your choice.

ougoah · Jan 29, 2025

Can a (some) knowledgeable member(s) please give some insights into where Qwen, Pangu, and Bagualu are in the world of AI?

From what I understand and iirc, these three are all multimodal language models? From Alibaba, Huawei, and Tsinghua uni no less so not exactly poorly funded ventures. Pangu has been used by Europeans for weather forecasting outside of China. Qwen has been a pretty solid LLM in bench testing. Bagualu has over a trillion parameters.

What about the other giants? does Tencent have a language model?

Great tools but glad that Sam Altman recently mentioned that AGI is nowhere near a reality when responding to the AI fanboys who got hyped up by Sam A's xmas tweets.

siegecrossbow · Jan 29, 2025

european_guy said:
This is the chief CTO of OpenAI (after Ilya Sutskever and many others were kicked out last year)

https://twitter.com/i/web/status/1884303237186216272

Now he reveals that they found the secret RL recipe to foster the emergence of thinking before DS. What DeepSeek found and published, was at the base of o1.

So, you can chose to keep confidential a very important piece of knowledge that otherwise would help the entire AI world to progress...it is a business decision, maybe not something people will applaud you, but understandable from a business POW.

But why disclose it now? Why tell everybody "I found it before! I just didn't tell anybody". Why speak only now that DS already published it openly?

This is very embarrassing and IMHO very sad from them: better to shut up and reflect on your choice.

CHAD stole NGAD’s ideas. This is why NGAD got delayed.

tphuang · Jan 29, 2025

GulfLander said:
Hailuo AI
"Introducing Hailuo T2V-01-Director Model: Control Your Camera Like a Pro!"

https://twitter.com/i/web/status/1884176446702428568
...

https://twitter.com/i/web/status/1881641699761668406
...
Hunyuan introduced Blender plugin for Hunyun3D 2.0

https://twitter.com/i/web/status/1884043264690643237

The links you have posted in the past 3 posts have mostly be posted before. Please check previous posts before just reposting everything again. You are being watched.

tphuang · Jan 29, 2025

https://twitter.com/i/web/status/1884244369907278106

here is a $6000 local setup that runs r1 that don't use any GPU

yeah, i don't buy the need for this many huge over priced data centers. Not when inference can become de centralized with open source models.

Temstar · Jan 29, 2025

tphuang said:
https://twitter.com/i/web/status/1884244369907278106

here is a $6000 local setup that runs r1 that don't use any GPU

yeah, i don't buy the need for this many huge over priced data centers. Not when inference can become de centralized with open source models.

6-8 token per second though, usable for single user but a bit slow.

Are models mostly limited by size of of the ram or speed of the ram? Because if it's just a case of needing a lot of ram you could go for things like intel Optane?

Please, Log in or Register to view URLs content!

Then you wouldn't even need server boards and EPYC CPUs?

Hyper · Jan 29, 2025

GulfLander said:
"DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses Nvidia's assembly-like PTX programming instead"

Please, Log in or Register to view URLs content!

Bad headline. PTX is Nvidia's proprietory low level api. CUDA is high level api.
Though no other firm could have done this because HFTs tinker with the hardware a lot.

tphuang · Jan 29, 2025

Temstar said:
6-8 token per second though, usable for single user but a bit slow.

Are models mostly limited by size of of the ram or speed of the ram? Because if it's just a case of needing a lot of ram you could go for things like intel Optane?

Please, Log in or Register to view URLs content!
Then you wouldn't even need server boards and EPYC CPUs?

depending on your usage case, you can just kick it off to run in the background overnight with a bunch of prompts.

I'm sure you can spend more money to run faster.

Artificial Intelligence thread

european_guy

Junior Member

GulfLander

Colonel

european_guy

Junior Member

ougoah

Brigadier

siegecrossbow

General

tphuang

General

tphuang

General

Temstar

Brigadier

Hyper

Junior Member

tphuang

General