Baidu primarily uses the Kunlun I for their cloud service and I believe Kunlun II will be primarily used for the same purpose. These are both FPGA based AI optimized chips with low production runs. The Kunlun I only had a total production run of around 20000 chips. The Kunlun 2 is dependent on Intel FPGAs. Intel's FPGAs are at 7nm and industry leading in power and performance. Much of the Kunlun 2 AI chip performance is related to this. If Baidu wanted to focus on performance alone, they wouldn't be using any FPGAs which are "RELATIVELY" slow and expensive. This is clearly because of the specialization for their cloud services. I expect the Kunlun II to serve as Baidu's newest cloud service chip.
It doesn't sound like they are using FPGAs, which should only be for prototyping. These are straight out ASICs.
- "In the early days of exascale computing and AI, these customer-configurable integrated circuits played a key role. Organizations could program and reprogram FPGAs onsite to handle a range of changing demands. As time went on, however, their performance and market growth got outpaced by faster GPUs and specialized ASICs."
- "Now, innovations like high-speed AI tensor logic blocks, configurable embedded SRAM, and lightning-fast transceivers and interconnects are putting this early leader back in the race."
- “FPGAs offer hardware customization with integrated AI and can be programmed to deliver performance similar to a GPU or an ASIC,” explains Kuppuswamy. “The reprogrammable, reconfigurable nature of an FPGA lends itself well to a rapidly evolving AI landscape, allowing designers to test algorithms quickly and get to market fast and scale quickly”.
- "FPGAs are often used where data must traverse many different networks at low latency. They’re incredibly useful at eliminating memory buffering and overcoming I/O bottlenecks — one of the most limiting factors in AI system performance. By accelerating data ingestion, FPGAs can speed the entire AI workflow."
- "The company also announced a partnership with Intel on a series of AI projects, including FPGA-backed workload acceleration, a deep learning framework based on Xeon scalable processors. Intel did not identify which FPGA series Baidu would use, but the chip maker recently announced the integration of its Arria family with its mainstream Xeon server chip."
- "This portion of the deal is a cloud-based partnership, as Baidu said it was looking to develop a “heterogeneous” cloud computing platform based on Intel FPGAs."
- "Ouyang Jian also showed the killer feature of the Kunlun chip and its compatibility with the China-made processor Feiteng via video."
- "Baidu AI chip accumulation was due to its FPGA to do the accumulation of AI acceleration, but also thanks to its software-defined accelerator and XPU architecture years of accumulation."
- "Ouyang Jian said, "Compared to GPUs, the Kunlun chip has done a good job of being versatile and programmable, and we're still working on making the programmability better."
- "After the release of Kunlun, news about it was released one after another. Architecture-wise, Kunlun has 2 computing units, 512GB/S of memory bandwidth and 16MB SRAM/unit."
- "According to Ouyang Jian, 16MB SRAM is good for AI inference, XPU-SDNN on XPU architecture is designed for Tensor and so on, and XPU-Cluster can meet the needs of general processing."
Baidu's AI chip initiative is part of a larger heterogeneous cloud computing platform. Meaning, they make use of more than 1 chip for their cloud computing services. The Kunlun chip is the main chip in this architecture but they also use the Feiteng CPU, and Intel FPGAs in conjunction for FPGA acceleration. The Kunlun chip itself is a sort of XPU accelerator, very likely FPGA based.
Articles on Kunlun range from being a standalone GPU accelerator, to standalone XPU, to FPGA prototype evolved to accelerator, to 1 part of a heterogeneous AI chip architecture. Keep this in mind because you will find confusion among different stories about what Kunlun is in different articles, including some claiming it is an ASIC. Imo, "
many" of these reporters are simply wrong. It is a fact that Intel FPGAs are used by Baidu in their cloud architecture since Kunlun I and we know Baidu primarily uses Kunlun for their cloud services. We also know Kunlun was designed to be versatile, configurable and programmable, which points to it being either directly FPGA based or co-processed with FPGAs. It's also telling that Kunlun is an AI chip that has an SRAM unit which is reminiscent of HBM used for FPGA acceleration in other AI chips.