Chinese semiconductor thread II

tokenanalyst · Dec 25, 2024

tphuang said:
That actually would be a disappointment. Where did you get the computation figures?

How much is the H100?

diadact · Dec 25, 2024

That actually would be a disappointment. Where did you get the computation figures?

910C is just 2 improved 910B(320TFLOP) dies joined together like Blackwell
The actual performance is 800 TFLOPS(2 dies at 400 TFLOPS each) at FP16

How much is the H100?

989 TFLOPS at FP16
Interconnect speed is the bigger problem
910C is just a stopgap for Huawei
Ascend 920/910D is the real deal
It will be on SMIC N+2 and interconnect speed will be more than 1Tbps
Single-die performance will match H100
While the chip as a whole(2 dies joined together like in Blackwell) will match B200

tphuang · Dec 25, 2024

diadact said:
910C is just 2 improved 910B(320TFLOP) dies joined together like Blackwell
The actual performance is 800 TFLOPS(2 dies at 400 TFLOPS each) at FP16

989 TFLOPS at FP16
Interconnect speed is the bigger problem
910C is just a stopgap for Huawei
Ascend 920/910D is the real deal
It will be on SMIC N+2 and interconnect speed will be more than 1Tbps
Single-die performance will match H100
While the chip as a whole(2 dies joined together like in Blackwell) will match B200

A simple google will indicate H100 does more computation than that for FP16.

If you have proof that 910C is two 910B die stitched together, then please provide that.

diadact · Dec 25, 2024

tphuang said:
A simple google will indicate H100 does more computation than that for FP16.

Nvidia intentionally uses the sparsity enabled performance in their official press release
Nobody uses this configuration for training or inference
I'm talking about dense performance

If you have proof that 910C is two 910B die stitched together, then please provide that.

The picture @olalavn posted literally has 2 dies stitched together like in Blackwell or Apple M series
Kirin PC chip will also be like that

tphuang · Dec 25, 2024

diadact said:
Nvidia intentionally uses the sparsity enabled performance in their official press release
Nobody uses this configuration for training or inference
I'm talking about dense performance

That’s fine. State it as such.

diadact said:
View attachment 141467
View attachment 141466

The picture @olalavn posted literally has 2 dies stitched together like in Blackwell or Apple M series
Kirin PC chip will also be like that

That’s fine. Nobody is doubting that 910C use two dies. What I am asking for is proof that it is using two 910B. You are telling me they spend 2 years and haven’t added more computation density.

diadact · Dec 25, 2024

You are telling me they spend 2 years and haven’t added more computation density.

80 TFLOP improvement(25% more performance) on each die
Each die on Blackwell also has a 25% compute improvement compared to H100
Ascend 910B/C is on SMIC N+1
Adding more transistors would have led to worse yields
As I said before 910C is a stopgap
They learned how to stitch together dies through high-bandwidth fabric
Most of their time was spent on Ascend 920 and improving interconnect speed
U will see all the GPU innovation that Huawei has developed after being sanctioned on Ascend 920(late 2025)
Huawei will remain 1 generation behind Nvidia until they can fab on EUV

OppositeDay · Dec 25, 2024

tphuang said:
I do think this is unlikely since we haven’t seen any major declines in SMIC ASP overall. It’s just too hard for 28nm to drop this much without being noticed. Although I am sure it has dropped a lot for the commoditized types like DDIC driver

What do you expect when it's written by some blogger named Little Drawing Fairy? People should refrain from posting stuff from random nobodies.

olalavn · Dec 25, 2024

tphuang said:
That actually would be a disappointment. Where did you get the computation figures?

it's just a test version of the SMIC process, it's also a low power version but it's not a high performance version yet.... 910D coming next year will have a different look

olalavn · Dec 26, 2024

The world's first Chiplet, the domestic self-driving chip is here! Arctic Xiongxin Qiming 935A is successfully lit up

Please, Log in or Register to view URLs content!

tphuang · Dec 26, 2024

diadact said:
80 TFLOP improvement(25% more performance) on each die
Each die on Blackwell also has a 25% compute improvement compared to H100
Ascend 910B/C is on SMIC N+1
Adding more transistors would have led to worse yields
As I said before 910C is a stopgap
They learned how to stitch together dies through high-bandwidth fabric
Most of their time was spent on Ascend 920 and improving interconnect speed
U will see all the GPU innovation that Huawei has developed after being sanctioned on Ascend 920(late 2025)
Huawei will remain 1 generation behind Nvidia until they can fab on EUV

So basically, there is absolutely no proof.

910B is already at 320 TFLOPS.

as for the rest, I have already explained recently. The majority of ascend dies produced in this past year have been at TSMC with N7 process.

Chinese semiconductor thread II

tokenanalyst

Lieutenant General

diadact

New Member

tphuang

General

diadact

New Member

tphuang

General

diadact

New Member

OppositeDay

Senior Member

olalavn

Senior Member

olalavn

Senior Member

tphuang

General