Artificial Intelligence thread

9dashline · Nov 18, 2024

9dashline said:
just api though, not open weights afaik

apparently they also have a long version with 10 million token context window

GulfLander · Nov 19, 2024

Chinese tech firms building AI teams in Silicon Valley?

Please, Log in or Register to view URLs content!

tphuang · Nov 19, 2024

GulfLander said:
Chinese tech firms building AI teams in Silicon Valley?

Please, Log in or Register to view URLs content!
Please, Log in or Register to view URLs content!

well, they've had big office in Silicon Valley for a while now, especially ByteDance.

And believe it or not, they work 996 hours!

diadact · Nov 19, 2024

This was unexpected
Step 2 is from Stepfun (

Please, Log in or Register to view URLs content!

)
LiveBench is one of the most accurate benchmarks available (

Please, Log in or Register to view URLs content!

)

tphuang · Nov 20, 2024

https://twitter.com/i/web/status/1859200141355536422

go Deepseek, some pretty strong claims here. Btw, o1-preview to me is the golden standard in reasoning, but it is so expensive (because it generates so many tokens and it's cost per 1m token is 6x that of gpt-4o). So, it would be good if there is a little competition here.

9dashline · Nov 20, 2024

tphuang said:
https://twitter.com/i/web/status/1859200141355536422

go Deepseek, some pretty strong claims here. Btw, o1-preview to me is the golden standard in reasoning, but it is so expensive (because it generates so many tokens and it's cost per 1m token is 6x that of gpt-4o). So, it would be good if there is a little competition here.

this will be open weights soon?

tphuang · Nov 20, 2024

9dashline said:
this will be open weights soon?

I would assume that weights will be available, but my understanding with o1 type is that there are additional reasoning steps. So is just having weights enough to replicate the model?

tphuang · Nov 20, 2024

https://twitter.com/i/web/status/1859302712803807696

According to Dylan Patel, Deepseek has > 50k H100s, so pretty decent stash

9dashline · Nov 20, 2024

tphuang said:
I would assume that weights will be available, but my understanding with o1 type is that there are additional reasoning steps. So is just having weights enough to replicate the model?

yes should be, see:

Please, Log in or Register to view URLs content!

def333 · Nov 20, 2024

China's new generation Tianhe supercomputer has won the top spot on the latest Big Data Green Graph 500 ranking

Please, Log in or Register to view URLs content!

Screenshot_2024-11-21-03-14-00-704_com.android.chrome-edit.jpg

Artificial Intelligence thread

9dashline

Major

GulfLander

Brigadier

tphuang

General

diadact

New Member

tphuang

General

9dashline

Major

tphuang

General

tphuang

General

9dashline

Major

def333

Junior Member