The story about 12nm Ascend may have originated from a misinterpretation of something in a Huawei paper from June 2024. It's a paper about HiFloat8, which is a new 8-bit floating point format proposed by Huawei, that does not currently exist in any production chips. So they did all their experiments using simulation. They test how HiF8 would perform if they were added to Ascend micro-architectures referenced as XXX1 and XXX2. They also do experiments to understand extra chip area that would be needed to add HiF8 by simulating a 12nm XXX1 core using RTL code.
This doesn't necessarily mean they would use 12nm for their production chips.
But if they are really capacity constrained on 7nm then fabbing AI chips on a 12nm process and dedicating all 7nm to smartphones/laptops isn't the worst idea. AI workloads are highly parallelized and run in data-centers with abundance of power. They could get more volume of chips while adding the same amount of compute per-chip through chiplets and compensate it with more power.
This doesn't necessarily mean they would use 12nm for their production chips.
But if they are really capacity constrained on 7nm then fabbing AI chips on a 12nm process and dedicating all 7nm to smartphones/laptops isn't the worst idea. AI workloads are highly parallelized and run in data-centers with abundance of power. They could get more volume of chips while adding the same amount of compute per-chip through chiplets and compensate it with more power.
Currently, no hardware platform is available to support the HiF8 data format and complete the computation process.
This paper also evaluates the overheads and benefits of HiF8 based on the electronic system level (ESL) models of Ascend XXX1 and XXX2. When Ascend XXX1 serves as the baseline, if the computing power of HiF8 is twice that of FP16, the area of AI Core increases by approximately 4.5%, and the training performance of ResNet50 and BERT can be improved by 26% and 61%, respectively. If Ascend XXX2 serves as the baseline, the training performance of ResNet50 and BERT can be improved by 31% and 67%, respectively.
We evaluated the area overhead of HiF8 by using the register-transfer level (RTL) pilot code based on the microarchitecture of the Cube processing unit in Ascend XXX1 when K equals 32 and the chip manufacturing process is 12 nm.