Artificial Intelligence thread

tphuang

General
Staff member
Super Moderator
VIP Professional
Registered Member
so if all of this is true, then it does seem like DeepSeek delayed some of its model development in order to adapt to this new FP format so that it can fully work and utilize next generation Ascend chips, which it will likely then get some low cost access to.

keep in mind that while next Ascend chip hasn't been unveiled yet, DeepSeek likely has access to it already.
 

european_guy

Junior Member
Registered Member
But then the mantissa and sign are stored in other byte, right?

So effectively, you are storing floats in 2 bytes and int in 1 byte. Otherwise, I'm not seeing the advantage of UE8M0 FP8 over FP16

No, the key point is you don't have mantissa. In the paper in the tweet post is explained how, but intuitively is like if you use only a subset of numbers (those that can be expressed with only an exponent) and live with them. According to the paper it still works very well.


But after I wrote my post I stumbled on this:


UE8M0 is for scales, not for the actual tensor elements, which remain E4M3 or E5M2;

IOW they use this new format only to scale up/down other fp8 values So DeepSeek 3.1 doesn't use this (quite disruptive) idea, but simply stores the scales in UE8M0 instead of the usual fp16 or fp32. This is to mimic how NVIDIA MXFP8 format works and today we know that also some future local chip will use the same MXFP8 (like NVIDIA from H100 on).

So from technical POW is no more so interesting, it remains a big news for the indirect announcement of a soon new Chinese chip (the new Ascend?)
 

siegecrossbow

General
Staff member
Super Moderator
New DeepSeek data format

Please, Log in or Register to view URLs content!



This is something that went a bit under the radar, but IMHO is a very interesting part.

What it means?

A model parameter/weight is a number, no more no less. DeepSeek has 671B parameters, it means it works by processing input across a huge number of big tables called matrices, each one with millions of parameters / numbers.

Now, how a computer represents a number? It can use a 16 bit format like fp16 where each number is stored in 16 bits, or a fp8 where each number is stored in 8 bits.

It can store the numbers as integers like in int8 or as floating point numbers as in fp8. Floating point it means that a number N is represented as a power of 2 multiplied by a fractional part:

N = s * m * 2^e, where e = exponent, m = mantissa, s = sign (-1 or 1)

View attachment 158942

In the picture above the orange part stores the exponent, the green the mantissa, so E4M3 it means 4 bits for the exponent and 3 for mantissa + 1 (the blue one) for the sign for a total of 8 bits -> FP8

So how is done the UE8M0 FP8 used by DeepSeek?

It is a 8 bit floating point number with all the 8 bits used for exponent, no mantissa. It is like you can only represent powers of 2, so for instance

exponent 1 (01) -> corresponds to 2^1 = 2
exponent 2 (10) -> corresponds to 2^2 = 4
exponent 3 (11) -> corresponds to 2^3 = 8

Now suppose you want to perform a multiplication:

2 * 4 = 8

In this UE8M0 format, 2 corresponds to exponent 1, 4 corresponds to 2 and 8 corresponds to 3. So

2 * 4 = 8 corresponds to summing the exponents 1 + 2 = 3

In this format multiplication can be implemented with a sum, and instead of the costly hardware multiply circuit a much simpler adder circuit can be used!

The above example is not a single case, this is called logarithmic number system and rely on the property that log(a*b) = log(a)+log(b)


Because in the models the biggest operation by far is matrix multiplication, this trick could simplify a lot the hardware.

Isn’t this how slide rules work?
 

jnd85

New Member
Registered Member
I think AI is moving towards more efficient networks achieving same performance instead of better performance. Cause it really hard to achieve better performance on chat bots without significant expansion in data quantity and quality.
If you look back at PC chip development, it followed kind of a similar trend with clock-cycles and performance to heat ratios. For the first couple decades it was all about boosting performance while accepting that heat would increase as a necessary evil. Then eventually the inefficiency was just too much to ignore and chip producers had no choice but to pump out several generations of chips that really only performed the same or moderately better than past generations, but ran cooler and more efficiently.

AI is kind of at the same point, performance levels already make it a useful tool, but it is incredibly inefficient. So now they have to focus on efficiency for a while, and that seems to be a trend across the board for all the LLM companies.

But where the chip manufacturers had to spend years before they could focus on performance again, I wager the effieciency gainst will be realized and LLMs will start focusing on better performance and features much faster.
 
Top