Note I have no proof of this other than my word.
Recently met with a Huawei employee who was pitching their 910B chips for GenAI. We didn't end up going with them, but in the process I learned some interesting tidbits of information:
- Huawei 910C is the same architecture as 910B
- The 910C is aiming for 800 TFLOPS of fp16 (unclear if fp32 accumulate, or fp16) -- it was mentioned that their goal is around Nvidia H200 NVL
- The 910C is on a Chinese 7nm process
- The 910C aims to use Chinese HBM2e, they provided no comment regarding capacity or bandwidth
- The 910C aims to resolve serious cross-card interconnect issues present in the 910B, which rendered the 910B unsuitable for training LLMs
- They mentioned that the chief designer of Huawei Ascend chips, who did the first Ascend design was a Chinese student educated in the USA. No details provided on if he was undergrad or PhD educated in the US. But mentioned his initial design focus was edge/low-power inference. They mentioned that a significant part of their EDA & compiler teams had undergrad/PhD US educations.
- They are aiming for an exact silicon doubling of the 910B. They suggested this was done via chiplets, but were evasive when I pushed for details and tried to confirm this
- Their goal is public sampling in 2025 Q1 or Q2
- They claimed better Pytorch compatibility than AMD, and said it was comparable to Intel's current GPU compatibility
- They claimed significant PyTorch compatibility improvements since 2024 Q1, since the 910B launched. And mentioned that a large effort was put into Pytorch operator compatibility/accuracy under fp16, and their own NPU API called ACL
- They grumbled about 910B being prioritized to some "cloud" infrastructure customers who didn't have a viable cloud business, and required significant on-site ecosystem support. They liked working with the GenAI startups who had the skills for scale out infrastructure
- They mentioned that demand outstripped supply as a whole
- They grumbled about certain customers still preferring to use smuggled Nvidia chips rather than their solution
- They grumbled about having to be bug compatible with Nvidia, and efforts to resolve accuracy issues
- They are aiming for a new architecture for whatever succeededs 910C