The rapid development of artificial general intelligence (AGI) introduces significant performance challenges for next-generation computing. Electronic devices such as graphics processing units (GPUs) are constrained by computational and energy efficiency limitations, hindering the advancement of modern AGI models.
In contrast, photonic computing offers unprecedented low-power computing at the speed of light, promising superior performance for intelligent tasks.
,
Spatial photonic computing, exemplified by diffractive deep neural networks (D2NNs),
,
achieves large-capacity computing but faces scalability issues due to the use of passive photonic devices. Meanwhile, integrated photonic computing, leveraging highly scalable Mach-Zehnder interferometers (MZIs),
typically involves hundreds to thousands of parameters, posing challenges for large-capacity computing. Additionally, inherent analog noise and time-varying errors in these systems limit them to simple tasks and shallow models, which are inadequate for real-world AGI applications.
In their recently published paper, Xu et al. introduce Taichi, as illustrated in
, a large-scale, highly scalable distributed photonic computing architecture designed for real-world AGI tasks, leveraging the advantages of optical diffraction and interference.
The double diffractive units for large-scale input and output passively perceive high-dimensional data and compactly represent them through universal diffraction, as illustrated in
. Task-specific feature embeddings are efficiently achieved via tunable matrix multiplication with fully reconfigurable MZI arrays. These components together form the scalable “DE-IE-DD” framework of Taichi, which can significantly reduce the required scale of the reconfigurable MZI array and support diverse and complex tasks with 3.8% reconfigurable part of 4256 total neurons. And the distributed architecture divides large tasks into several subtasks, which are parallel processed using Taichi chiplets, as depicted in
. Computing resources are allocated to multiple independent clusters, each organized separately for subtasks and ultimately synthesized to handle complex advanced tasks. The authors report an experimental 1000-category classification of 91.89% on the 1623-category Omniglot dataset, marking the first attempt at achieving on-chip capability with 13.96 million neurons and an energy efficiency of 160 TOPS/W. This represents a highly promising approach in the field.