Artificial Intelligence thread

9dashline · Dec 20, 2024

Hyper said:
They said that curiosity killed the cat.

What is really sad is they are coming out with a $2000/mo subscription today for the "o3" model.

At this rate it will be cheaper just to hire a Jia Hind and have him user QwQ

https://www.reddit.com/r/OpenAI/comments/1higq81

Bellum_Romanum · Dec 20, 2024

9dashline said:
What is really sad is they are coming out with a $2000/mo subscription today for the "o3" model.

At this rate it will be cheaper just to hire a Jia Hind and have him user QwQ

https://www.reddit.com/r/OpenAI/comments/1higq81

Sam Altman thinks he's a big time PIMP and the gullible AI ignorant masses (myself included) are his LITERAL HO-ES that will be dumb enough to HO for his $$$ ridiculous subscription fee. Lol

tphuang · Dec 20, 2024

btw, no AI shop can afford to pay the kind of rates that openAI is charging for o1

9dashline · Dec 20, 2024

Bellum_Romanum said:
Sam Altman thinks he's a big time PIMP and the gullible AI ignorant masses (myself included) are his LITERAL HO-ES that will be dumb enough to HO for his $$$ ridiculous subscription fee. Lol

Google is currently offering Gemini 2.0 Thinking Experimential (o1 competitor) for free to everyone... so this is price undercut stategy as loss lead to gain marketshare /mindshare ; and Qwen is coming out with QvQ a 72b vision reasoning model soon

There is no moat

Please, Log in or Register to view URLs content!

9dashline · Dec 20, 2024

Please, Log in or Register to view URLs content!

Overbom · Dec 20, 2024

9dashline said:
Please, Log in or Register to view URLs content!

OpenAI is cooking some good stuff

OpenAI's new o3 system - trained on the ARC-AGI-1 Public Training set - has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit. A high-compute (172x) o3 configuration scored 87.5%.

Despite the significant cost per task, these numbers aren't just the result of applying brute force compute to the benchmark. OpenAI's new o3 model represents a significant leap forward in AI's ability to adapt to novel tasks. This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs. o3 is a system capable of adapting to tasks it has never encountered before, arguably approaching human-level performance in the ARC-AGI domain.

Caveats

Passing ARC-AGI does not equate to achieving AGI, and, as a matter of fact, I don't think o3 is AGI yet. o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence.

Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute (while a smart human would still be able to score over 95% with no training). This demonstrates the continued possibility of creating challenging, unsaturated benchmarks without having to rely on expert domain knowledge. You'll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible.

9dashline · Dec 20, 2024

Overbom said:
OpenAI is cooking some good stuff
View attachment 141223

Caveats

yup price needs to drop about three orders of magnitude before it becomes useful to replace STEM grads.... but it does show that inference time computing scaling will get LLMs a bit further down the path before hitting wall etc

tygyg1111 · Dec 21, 2024

Bellum_Romanum said:
Why the F did you even pay that amount to begin with? Are you serious? I thought YOU WERE SMARTER than this.

He lives on the edge

FairAndUnbiased · Dec 21, 2024

Overbom said:
OpenAI is cooking some good stuff
View attachment 141223

Caveats

Looks to me like the STEM grad steamrolls in cost effectiveness at 10 score per 1 USD.

Overbom · Dec 21, 2024

FairAndUnbiased said:
Looks to me like the STEM grad steamrolls in cost effectiveness at 10 score per 1 USD.

First it was impossible, now it's cost-effectiveness. Next year should be equal cost

Artificial Intelligence thread

9dashline

Captain

Bellum_Romanum

Brigadier

tphuang

General

9dashline

Captain

9dashline

Captain

Overbom

Brigadier

9dashline

Captain

tygyg1111

Captain

FairAndUnbiased

Brigadier

Overbom

Brigadier