Artificial Intelligence thread

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
this is a logical comparison. 70B open source LLMs that are available for everyone to use is something that open source provider can compare pretty well. And it's a good thing for hugging face to recommend
 

tokenanalyst

Brigadier
Registered Member

Alibaba Cloud showcases its self-developed network design for large language model training​


On June 29, Alibaba Cloud recently announced its Ethernet network design specially created for ultra-large data transmission for training large language models (LLMs), and it has been used in actual business for 8 months.
Alibaba Cloud chose Ethernet out of a desire to avoid over-dependence on a few suppliers and to leverage the "power of the entire Ethernet Alliance to achieve faster development." This decision also seems to be in line with the fact that more and more manufacturers are beginning to support Ethernet and escape Nvidia's NVlink monopoly on cloud AI interconnection.

Alibaba's Ethernet networking plans were revealed on the GitHub page of Ennan Zhai, a senior engineer at Alibaba Cloud and a researcher in networking research. Zhai published a paper that will be presented in August at the SIGCOMM conference, the annual gathering of the Association for Computing Machinery's Special Interest Group on Data Communications.

The paper, titled “Alibaba HPN: A Datacenter Network for Large-Scale Language Model Training,” begins by noting that cloud computing traffic “…generates millions of small flows (e.g., less than 10 Gbit/s)” while large language model training “generates a small amount of periodic, bursty traffic on each host (e.g., 400 Gbit/s)”.

Equal-cost multipath routing is a commonly used method for sending packets to a single destination over multiple paths, but it is prone to hash polarization, a phenomenon that makes load balancing difficult and significantly reduces available bandwidth.
Alibaba Cloud’s home-grown alternative, called High Performance Network (HPN), “avoids hash polarization by reducing the presence of ECMP, while also greatly reducing the search space for path selection, allowing us to accurately select network paths that can accommodate large traffic flows.”

HPN also addresses the fact that GPUs need to work synchronously when training large language models, which makes AI infrastructure sensitive to single points of failure—especially top-of-rack switches.
As a result, Alibaba's network design uses a pair of switches—but not in the stacked configuration recommended by switch vendors.​

1719771177027.png

Please, Log in or Register to view URLs content!
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Please, Log in or Register to view URLs content!

Moore threads boasted about its KUAE GPU clusters.
It can now be expanded to 10s of thousands and provide 10 EFLOPS+ in total computation

can continue to train of 15+ days

摩尔线程将开展三个万卡集群项目,分别为青海零碳产业园万卡集群项目、青海高原夸娥万卡集群项目、广西东盟万卡集群项目。
they have 3 such clusters, 2 in Qinghai and 1 in Guangxi
 

sunnymaxi

Captain
Registered Member
Please, Log in or Register to view URLs content!

China is now home to more than 1/3 of the world's 1,328 AI large language models and 15% of nearly 30,000 AI enterprises worldwide, according to a whitepaper released at the Global Digital Economy Conference 2024 in Beijing on Tuesday..

Please, Log in or Register to view URLs content!
will develop more than 50 new national and industrial standards for
Please, Log in or Register to view URLs content!
by 2026 to facilitate the high-quality development of the AI industry. This goal is part of guidelines on standardizing systems for the AI industry that were jointly issued by four State government agencies yesterday.

The country also aims to participate in the formation of more than 20 international AI standards by 2026 to promote the development of the global AI sector, according to the guidelines. Furthermore, China aims to have more than 1,000 companies adopt and advocate for these new standards..
 
Top