Alibaba Cloud showcases its self-developed network design for large language model training
On June 29, Alibaba Cloud recently announced its Ethernet network design specially created for ultra-large data transmission for training large language models (LLMs), and it has been used in actual business for 8 months.
Alibaba Cloud chose Ethernet out of a desire to avoid over-dependence on a few suppliers and to leverage the "power of the entire Ethernet Alliance to achieve faster development." This decision also seems to be in line with the fact that more and more manufacturers are beginning to support Ethernet and escape Nvidia's NVlink monopoly on cloud AI interconnection.
Alibaba's Ethernet networking plans were revealed on the GitHub page of Ennan Zhai, a senior engineer at Alibaba Cloud and a researcher in networking research. Zhai published a paper that will be presented in August at the SIGCOMM conference, the annual gathering of the Association for Computing Machinery's Special Interest Group on Data Communications.
The paper, titled “Alibaba HPN: A Datacenter Network for Large-Scale Language Model Training,” begins by noting that cloud computing traffic “…generates millions of small flows (e.g., less than 10 Gbit/s)” while large language model training “generates a small amount of periodic, bursty traffic on each host (e.g., 400 Gbit/s)”.
Equal-cost multipath routing is a commonly used method for sending packets to a single destination over multiple paths, but it is prone to hash polarization, a phenomenon that makes load balancing difficult and significantly reduces available bandwidth.
Alibaba Cloud’s home-grown alternative, called High Performance Network (HPN), “avoids hash polarization by reducing the presence of ECMP, while also greatly reducing the search space for path selection, allowing us to accurately select network paths that can accommodate large traffic flows.”
HPN also addresses the fact that GPUs need to work synchronously when training large language models, which makes AI infrastructure sensitive to single points of failure—especially top-of-rack switches.
As a result, Alibaba's network design uses a pair of switches—but not in the stacked configuration recommended by switch vendors.