Chinese semiconductor thread II

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
The total interconnect speed of 910-B is already equivalent to that of H800, which is about 400GB/s.
interconnect speed is limited by SERDES chips, which is actually a process node limited.
I distinctly remember an Ascend slide from 2023 that showed router level speed of 8-card Ascend cluster to be 1.6TB/s, which would work out to 200GB/s between the cards.
I would certainly gladly see evidences that it is now 4.8TB/s if that's available.
 

huemens

Junior Member
Registered Member
interconnect speed is limited by SERDES chips, which is actually a process node limited.
I distinctly remember an Ascend slide from 2023 that showed router level speed of 8-card Ascend cluster to be 1.6TB/s, which would work out to 200GB/s between the cards.
I would certainly gladly see evidences that it is now 4.8TB/s if that's available.

Each card has a total interconnect bandwidth of 400GB/s. In Huawei's topology each card connects to the other 7 cards in the cluster directly in a peer-to-peer fashion. So for each card that 400GB/s is devided into 7 links which goes to the other 7 cards. So the per-link speed is (400/7) = 57.14GB/s. The total number of links with in the 8-card cluster will be 28. (400/7) * 28 = 1.6TB/s.

Nvdia uses a switched topology where every card is connected to multiple switches. The practical difference is, in Huawei's topology the full 400GB/s will only be saturated when the card is talking to all other 7 cards, while in Nvidia's topology the full 400GB/s can be used between only 2 cards, because every card is connected to every other card through multiple links. Here's a hypothetical 4-card cluster (Huawei in the left, Nvidia on the right). Huawei's approach is more cost effective because the additional switching fabric is not required, but at a disadvantage for any application that would require the full 400GB/s connectivity between just 2 cards.


huawei-nvidia-1.png
 
Last edited:

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Each card has a total interconnect bandwidth of 400GB/s. In Huawei's topology each card connects to the other 7 cards in the cluster directly in a peer-to-peer fashion. So for each card that 400GB/s is devided into 7 links which goes to the other 7 cards. So the per-link speed is (400/7) = 57.14GB/s. The total number of links with in the 8-card cluster will be 28. (400/7) * 28 = 1.6TB/s.

Nvdia uses a switched topology where every card is connected to multiple switches. The practical difference is, in Huawei's topology the full 400GB/s will only be saturated when the card is talking to all other 7 cards, while in Nvidia's topology the full 400GB/s can be used between only 2 cards, because every card is connected to every other card through multiple links. Here's a hypothetical 4-card cluster (Huawei in the left, Nvidia on the right). Huawei's approach is more cost effective because the additional switching fabric is not required, but at a disadvantage for any application that would require the full 400GB/s connectivity between just 2 cards.


View attachment 136655
interesting, but the problem looking at latest NVLink & NVSwitch set up is that H200 will increase the interconnect bandwidth to 900GB/s and Blackwell increases is to 1.8TB/s
Please, Log in or Register to view URLs content!
So, Huawei does have some work to do here to catch up. I looked through my slides and the best I can find in terms of where they are getting to is this. If they are still on this path, then using optical switch for their clusters might be the big innovation for Ascend-910CScreenshot 2024-09-29 at 3.00.43 PM.png
 

huemens

Junior Member
Registered Member
interesting, but the problem looking at latest NVLink & NVSwitch set up is that H200 will increase the interconnect bandwidth to 900GB/s and Blackwell increases is to 1.8TB/s
Please, Log in or Register to view URLs content!
So, Huawei does have some work to do here to catch up. I looked through my slides and the best I can find in terms of where they are getting to is this. If they are still on this path, then using optical switch for their clusters might be the big innovation for Ascend-910C

Switched architecture would definitely help. A100 has 12 links of 50GB/s per card giving a total of 600GB/s per card. Nvidia have maintained same 50GB/s per link even in H100 but increased the total card bandwidth by adding more links per card and more switches. Each H100 card has 18 links so it can get total 900GB/s.
 
Last edited:

interestedseal

Junior Member
Registered Member
interconnect speed is limited by SERDES chips, which is actually a process node limited.
I distinctly remember an Ascend slide from 2023 that showed router level speed of 8-card Ascend cluster to be 1.6TB/s, which would work out to 200GB/s between the cards.
I would certainly gladly see evidences that it is now 4.8TB/s if that's available.
Please, Log in or Register to view URLs content!

Please, Log in or Register to view URLs content!

Please, Log in or Register to view URLs content!

SerDes is not an issue for HW. They’ve had 112G serdes tech since 2021, and may already be using 224G SerDes in this 100T Ethernet switch
 

Hyper

Junior Member
Registered Member
interesting, but the problem looking at latest NVLink & NVSwitch set up is that H200 will increase the interconnect bandwidth to 900GB/s and Blackwell increases is to 1.8TB/s
Please, Log in or Register to view URLs content!
So, Huawei does have some work to do here to catch up. I looked through my slides and the best I can find in terms of where they are getting to is this. If they are still on this path, then using optical switch for their clusters might be the big innovation for Ascend-910CView attachment 136657
This looks much worse for Cisco than Huawei. Nvidia has already started selling Ethernet and Infiniband switches with custom ASIC. Nvidia is probably a full data center solution to Huawei.
 
Top