Hendrik_2000
Lieutenant General
@Hendrik_2000
Huawei also has a 7nm GPU, but without TSMC FAB it was place on a backburner. Hope SMIC 7nm N+2 come on line this year, they have a lot of customer lining up to use its node.
I don't know anything about Huawei GPU But this new GPU is gear more toward AI , Server and Machine learning So it is closer to NVDIA product Here is the spec. GPU is now the trend in industry supplanting CPU So this is a big break for China for once they have a product that can goes toe to toe with NVDIA
Unlike the GPUs announced back in 2019, the new BI models are not specifically designed to compete with the gaming and models, as they are more tailored towards AI and HPC applications, plus other general purpose uses for the education, medicine and security sectors.
The BI packs 24 billion transistors, and it’s based on a home-made GPU architecture, the report said, offering an impressive price/performance ratio.
The chip is built with the cutting-edge 7nm process node and 2.5D CoWoS (chip-on-wafer-on-substrate) packaging.
According to tomsHARDWARE, Tianshu Zhixin commenced development on its BI chip in 2018.
The company finalized the tapeout for BI back in May 2020 and should have already underwent mass production if Tianshu Zhixin wants to meets its goal of commercializing the chip this year.
The BI solution reportedly provides twice the performance of existing mainstream products on the market, while also offering a very appealing performance-to-cost ratio, the report said.
Its main profile is, of course, machine learning and serving the HPC market, and to this end it also uses TSMC’s CoWoS technology, which puts the memory in an encapsulated graphics card, thus enabling very high memory bandwidth.
This is probably an HBM2 standard, but this has not been specifically explained, the report said.
BI supports a plethora of floating point formats, including FP32, FP16, BF16, INT32, INT16, and INT8, just to mention a few.
Tianshu Zhixin is keeping a tight lip on the BI’s performance, but the company has teased FP16 performance up to 147 TFLOPS (floating point operations per second, in trillions), the report said.
For comparison, the Nvidia A100 and AMD Instinct MI100 deliver FP16 performance figures up to 77.97 TFLOPS and 184.6 TFLOPS, respectively.
While most of the home-grown Chinese processors we have seen thus far under-delivered when it comes to direct performance comparisons with mainstream US models, if China keeps up like this on the performance side and also offers aggressive price schemes, we could be seeing some decent alternatives to the US models in just a few years.