More from Loongson.
They signed an agreement with Henan's Yuxin electronics to provide chips which will support Henan province's digital transformation strategy.
A good article on guancha about Loongson's progress. Obviously more rosy than it deserves probably
日前,龙芯发布了用于服务器市场的的3D5000系列芯片,引发关注。
3D5000与3C5000属于同一代CPU,是采用Chiplet技术把两片3C5000芯片互联和封装在一起,进而获得一片32核CPU,这种方式也被称为“胶水32核”。就性能而言,龙芯3D5000的IPC接近AMD Zen2的水平,全芯片性能与同主频下的32核AMD Zen2架构CPU接近,对于大部分应用已经是够用了。
在2023年,性能更强的龙芯6000系列CPU将要问世,龙芯在IPC上与英特尔、AMD的差距将会进一步缩小,真正阻碍龙芯在市场上推广的要素将不再是CPU性能,而是软件生态。
Compares the recently launched 3D5000 to AMD Zen 2 which was launched in 2019 (so Loongson about 3 years behind). Says that the more advance 6000 series will come out in 2023 to further narrow gap with Intel/AMD
龙芯5000系列是自主CPU里程碑
2019年,龙芯3A4000四核处理器亮相。龙芯3A4000是继3A3000之后的新一代处理器。3A4000既升级了新内核GS464V,IPC大幅提升;又通过在原有28nm工艺上深入磨合优化,改进电路和物理设计方法,在制造工艺与3A3000相同的情况下,将性能提升了一倍。就3A4000的IPC而言,已经从3A3000的7/G达到了9.8/G,AMD的Zen大致也就在10/G水平,龙芯3A4000的GS464V是一款可以与第一代Zen匹敌的内核。
3A5000的内核是基于GS464V进行小改,IPC为10.6/G,采用12nm工艺,主频为2.2G至2.5G,后期流片的有2.7G主频版本,SPEC06定点成绩超过26(GCC,@2.5Ghz),这对于自主CPU而言已经是非常不错了,即便和引进的CPU相比,其定点和浮点性能仅次于海光,超过其他引进的X86和ARM CPU。
Says that 3A5000 is only behind Hygon in desktop CPU amongst domestic options and better than others (that would include Phytium and Zhaoxin here)
龙芯3A5000与龙芯3C5000、3D5000属于同一代CPU,3C5000采用LoongArch指令集,16核心单芯片unixbench分值9500以上,双精度计算能力达560GFlops,16核处理器峰值性能与典型ARM 64核处理器的峰值性能相当,并支持最高16路互连,搭配新一代龙芯7A2000桥片,PCIe吞吐带宽比上一代提升400%以上。就SPEC2006测试来看,单核定点浮点Base分均大于10/G,单芯片分值超过200。可满足通用计算、大型数据中心、云计算中心的计算需求。该处理器通过芯片级安全机制可为等保2.0、可信计算、国密算法替代、网络安全漏洞防护等提供CPU级内生支持。
3C5000最大特点是单核性能强,特别是unixbench这种看重单核和内存性能,多核加速比很低的测试,龙芯只用16核就能跑到9500,某ARM CPU即便有64核也跑不到这个成绩。从公开的数据来看,3C5000的性能在信创市场足够用了,而且16核的核心是使其部署比较灵活。
Says that the strength of 3C5000 is single core performance. It's 16 core architecture achieves better result than certain ARM CPU with 64 core (I assume this is Phytium's S2500 CPU).
龙芯3D5000则是把两个3C5000封装到一起的胶水32核芯片,集成了64MB的L3 Cache,支持最多8个DDR4-3200 DRAM,可以通过HyperTransport接口构建至多四路处理器,因此单机可以支持多达128核。在性能方面,龙芯3D5000单路和双路服务器的SPEC CPU2006 Base实测可以超过400分和800分,预计四路服务器的分值可以达到1600分。可以说,龙芯3D5000主要针对一些对性能有更高要求的场景,只要软件生态跟得上,完全可以替换掉英特尔至强CPU。
3D5000 is 2 3C5000 chiplet stacked together for 32 cores. With 4 3D5000, can reach SPEC CPU2006 score of 1600 and satisfy HPC requirement and replace Intel CPUs.
龙芯6000性能将再次飞跃
相对于一些技术引进CPU在引进海外技术后CPU IPC增长缓慢,性能提高基本依靠购买更好的EDA工具和买台积电更好的工艺。龙芯一直致力于提升CPU微结构设计水平来提升CPU的性能,没有盲目去堆核心数量。这种稳扎稳打的做法使龙芯在过去10年中IPC提升了3-4倍,在桌面CPU上成效立竿见影。 CPU的IPC在过去10年中提升了3-4倍,这使龙芯可以在制造工艺上落后技术引进的某ARM CPU一代的情况下,依然可以依靠CPU微结构设计水平做到性能持平或略优于技术引进的某ARM CPU。当龙芯与引进的某ARM CPU采用相同工艺时,龙芯可以凭借其IPC上的优势在性能上领先某ARM CPU。
Says that CPUs using imported ISA take long time to improve in IPC. Need to use better EDAs tools or have TSMC process to improve performance.
Loongson can improve IPC through improved micro architecture In the past 10 years, Loongson have improved IPC by 3 to 4 times. This allow CPU to outperform certain ARM CPU (I assume D2000 here) despite using more behind process. When using same process, Loongson can achieve performance advantage over ARM CPU (again, I assume Phytium here)
3A6000和3A5000采用相同制造工艺,龙芯依靠其设计能力把CPU性能大幅提升,主要是拉大框架,比如把4发射改成6发射等等。从此前公布的仿真成绩看,定点相对于3A5000提升30%,浮点相对于3A5000提升60%,这种提升是非常骇人的——如果仿真成绩与最终成绩相当,那么,3A6000 SPEC06单核定点Base分大于13/G,浮点Base分大于16/G,基本达到AMD Zen2水平。如果3A5000为2.5G至2.8G,那么,3A6000的 SPEC06单核定点Base分大于35,浮点将大于45。
Loongson managed to raise int calculation performance by 30% and floating point calculation by 60% from 3A5000 to 3A6000 despite using same process (by improving core design I guess)
从公开信息看,在使用相同工艺的情况下,3A6000性能比3A5000提升40%—60%,芯片面积缩小20%,12nm的3A6000对标7nm的AMD Zen2。做最保守估算,3A6000 SPEC06单核定点Base分为32分(@2.5G)至35分(@2.8G)。这个性能对于信创和日常使用而言都已经明显过剩了。
必须说明的是,仿真往往是不准确的,有的公司会高估,有的公司会低估,从龙芯这几年发布的信息看,龙芯是偏保守的,实测成绩只会比仿真成绩好,以最近流片回来的2K2000来看,实测成绩比龙芯仿真成绩高了20%至30%,这大大超乎龙芯的预期。龙芯2k2000的LA364性能基本追平ARM A76,充分展示了自主路线的发展潜力和发展活力。
3C6000是16核服务器芯片,内核是LA664,与3A6000相同。3D6000则是两片3C6000封装在一起构成32核服务器CPU,可以匹敌搭载Zen2核心的AMD EPIC。只要软件能跟上,商业市场已经没有性能短板了。
龙芯下一代7000系列CPU,进一步提升CPU核性能,IPC瞄准Zen3和12代酷睿,计划采用7nm工艺,SPEC06定点Base最保守估算是40分,届时,会有24-32核的3D7000(7nm)和48-64核3E7000(两片封装)。
3A6000 surface area shrunk by 20%. Target is Zen2 (although in other places I've seen them comparing 3A6000 to Zen3). Generally, they boast bout the new LA66A core to be used by 6000 series. 3C6000 is a 16 core CPU and 3D6000 is a stacked 3C6000.
7000 series CPU will use 7nm process -> this is quite interesting since they are basically expecting SMIC to be able to produce 7nm CPU for them by then, which would be in the 2024/2025 timeframe. They will have a 24-32 core 3D7000 and 48 to 64 core 3E7000. sounds like they are aiming to close gap further with ARM CPUs.
After that, a bunch of discussion on the advantages of chiplet.
I think the key for Loongson is getting more domestic software developers to write applications for them and to make sure it works with all the domestic OS like Harmony, Euler, Kylin & UOS so all the domestic desktops/server computers can use them. It seems to me that x86 and ARM CPU designs don't have a lot of future in China. They are a nice interim phase so that companies can move to domestic CPUs. But over longer term, China will have to produce its own CPUs and won't have access to improvement in ARM/x86 ISA, so they will have to rely on RISC-V and LoongArch which they will have continued access and also reduced instruction set architectures that have lower power consumption.