Artificial Intelligence thread

AndrewJ

Junior Member
Registered Member
WTF is going on with DeepSeek? :rolleyes:

Why open V3.1 so secretly? Why remove R1? Why still not open R2 after so many delays? :mad:

Chinese artificial intelligence start-up DeepSeek has updated its foundational V3 model and removed references to its reasoning model R1 from its chatbot, prompting speculation about a shift in the company’s research focus.
DeepSeek announced on Tuesday the release of the V3.1 model in a brief message to one of its WeChat user groups. The update expands the context window to 128k, allowing the model to hold more information – equivalent to a roughly 300-page book – during user interactions.
The company did not announce the update on its public social media channels, including its X account.
DeepSeek has also deleted references to the R1 model from its chatbot’s “deep think” feature
, raising questions about the progress of its much-anticipated next-generation R2 model.

Please, Log in or Register to view URLs content!
 

tamsen_ikard

Senior Member
Registered Member
WTF is going on with DeepSeek? :rolleyes:

Why open V3.1 so secretly? Why remove R1? Why still not open R2 after so many delays? :mad:



Please, Log in or Register to view URLs content!
Deepseek is a tiny team that likely had huge attrition from all the poaching after the huge rise in fame.

I think we should not expect much from deepseek any more. The key innovations will come from the big techs such as Owen bytedance and so on.
 

gadgetcool5

Senior Member
Registered Member
Well, they did finally announce it on their X account.

Reasoning
There was only a 1 point gain on the Artificial Analysis Index for the reasoning version, and the token speed (which is abysmally low) was unchanged compared to 0528. On the other hand, token efficiency had a huge jump, and the cost to run also dropped by a large amount. It's also significantly better than 0528 in some more narrow benchmarks like coding. Added together, it's definitely an upgrade, albeit a minor one. However, Deepseek is no longer the best open weights reasoning model; it lags behind Qwen3-235B, and has ever since the release of the latter.

Non-Reasoning
v3.1 saw significant gains on the Artificial Analysis Index and jumped ahead of GPT 4.1, which was released by OpenAI as their flagship non-reasoning model in April. However, here again, it's (slightly) behind Qwen3-235B non-reasoning, which is the world's best non-reasoning model.


Ancedotally, I've noticed the reasoning option does respond in about half the time it took before. That should be a noticeable improvement in user experience for users of the chat/app.
 
Last edited:

Michael90

Junior Member
Registered Member
Why open V3.1 so secretly? Why remove R1? Why still not open R2 after so many delays? :mad:
Calm down. Lol. I think they will launch R2 before end of the year. They have been some delays, peobably due to using Huawei ascend Chips which still needs some ironing out a few bugs as mentioned in an article revently though its not confirmed, but i believe it might be true, Since cant think of any other reason for such a long delay. However, they will get there. I think its for the best, better abit of delay while securing the country independent homegrown AI chips sector and ecosystem. Short term pain for lomg term gain..
 

TPenglake

Junior Member
Registered Member
Nvidia just stopped the production H20 chips.

Please, Log in or Register to view URLs content!

I haven't seen this thread talk about it yet, but from what I'm reading Deepseek's latest update was a bit muted since yes there wasn't any big changes, but on their Weibo they confirmed it was trained on the parameter precision scale UE8MO FP8. I just wrote that out verbatim since I don't know what the hell any of it means, but it does indicate that after relying Nvidia's chips and then trying and being dissapointed with Huawei's, Deepseek seems to have found another source of advanced chips on which to train their model.

How that will factor into their future development, we shall see.
 

european_guy

Junior Member
Registered Member
New DeepSeek data format

Please, Log in or Register to view URLs content!

Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats

This is something that went a bit under the radar, but IMHO is a very interesting part.

What it means?

A model parameter/weight is a number, no more no less. DeepSeek has 671B parameters, it means it works by processing input across a huge number of big tables called matrices, each one with millions of parameters / numbers.

Now, how a computer represents a number? It can use a 16 bit format like fp16 where each number is stored in 16 bits, or a fp8 where each number is stored in 8 bits.

It can store the numbers as integers like in int8 or as floating point numbers as in fp8. Floating point it means that a number N is represented as a power of 2 multiplied by a fractional part:

N = s * m * 2^e, where e = exponent, m = mantissa, s = sign (-1 or 1)

1755846340785.png

In the picture above the orange part stores the exponent, the green the mantissa, so E4M3 it means 4 bits for the exponent and 3 for mantissa + 1 (the blue one) for the sign for a total of 8 bits -> FP8

So how is done the UE8M0 FP8 used by DeepSeek?

It is a 8 bit floating point number with all the 8 bits used for exponent, no mantissa. It is like you can only represent powers of 2, so for instance

exponent 1 (01) -> corresponds to 2^1 = 2
exponent 2 (10) -> corresponds to 2^2 = 4
exponent 3 (11) -> corresponds to 2^3 = 8

Now suppose you want to perform a multiplication:

2 * 4 = 8

In this UE8M0 format, 2 corresponds to exponent 1, 4 corresponds to 2 and 8 corresponds to 3. So

2 * 4 = 8 corresponds to summing the exponents 1 + 2 = 3

In this format multiplication can be implemented with a sum, and instead of the costly hardware multiply circuit a much simpler adder circuit can be used!

The above example is not a single case, this is called logarithmic number system and rely on the property that log(a*b) = log(a)+log(b)


Because in the models the biggest operation by far is matrix multiplication, this trick could simplify a lot the hardware.
 

tphuang

General
Staff member
Super Moderator
VIP Professional
Registered Member
Please, Log in or Register to view URLs content!

Please, Log in or Register to view URLs content!
this is pretty significant. Actually, I still use mostly Claude in cursor. But if Qwen3 is really this good.


The DeepSeek + domestic chip UE8M0 FP8 is pretty big news.
New DeepSeek data format



So how is done the UE8M0 FP8 used by DeepSeek?

It is a 8 bit floating point number with all the 8 bits used for exponent, no mantissa. It is like you can only represent powers of 2, so for instance

exponent 1 (01) -> corresponds to 2^1 = 2
exponent 2 (10) -> corresponds to 2^2 = 4
exponent 3 (11) -> corresponds to 2^3 = 8

Now suppose you want to perform a multiplication:

2 * 4 = 8

In this UE8M0 format, 2 corresponds to exponent 1, 4 corresponds to 2 and 8 corresponds to 3. So

2 * 4 = 8 corresponds to summing the exponents 1 + 2 = 3

In this format multiplication can be implemented with a sum, and instead of the costly hardware multiply circuit a much simpler adder circuit can be used!

But then the mantissa and sign are stored in other byte, right?

So effectively, you are storing floats in 2 bytes and int in 1 byte. Otherwise, I'm not seeing the advantage of UE8M0 FP8 over FP16
 
Top