Artificial Intelligence thread

Hyper · Mar 2, 2025

GulfLander said:
I mean telling a "random guy" during a flight.

Overlooking his laptop. Maybe that's how he knew.

tphuang · Mar 3, 2025

https://twitter.com/i/web/status/1896521521260314811

Beijing with new public data center for smart computation. Another 10 EFLOPS computation. Had 22 EFLOPS as of January, so this brings it to 32 EFLOPS

tphuang · Mar 3, 2025

https://twitter.com/i/web/status/1896523349020311580

first Chinese AI IDE Trade is launched by ByteDance inside China. It had been already available globally.

tphuang · Mar 3, 2025

The size of data centers for some of these Chinese automakers seem pretty unbelievable.

Geely has a 23.5EFLOPS data center for various tech initiatives.

https://twitter.com/i/web/status/1896575021210042502

tphuang · Mar 3, 2025

Please, Log in or Register to view URLs content!

Another high performing high performing kernel for AI inference written by another Chinese team. this is TileLang from Beijing

tphuang · Mar 4, 2025

Microsoft making DeepSeek R1 7B and 14B distilled models for CoPilot and PCs available in its Azure AI foundry.

https://twitter.com/i/web/status/1896659708376670318

tokenanalyst · Mar 4, 2025

DeepSeek Distills Running on RockChips

Please, Log in or Register to view URLs content!

tokenanalyst · Mar 4, 2025

Chain-of-Experts: Unlocking the Communication Power of MoEs

Introduction

We propose Chain-of-Experts (CoE), which fundamentally changes sparse Large Language Model (LLM) processing by implementing sequential communication between intra-layer experts within Mixture-of-Experts (MoE) models.

Mixture-of-Experts (MoE) models process information independently in parallel between experts and have high memory requirements. CoE introduces an iterative mechanism enabling experts to "communicate" by processing tokens on top of outputs from other experts.

Experiments show that CoE significantly outperforms previous MoE models in multiple aspects:

Performance: CoE with 2x iterations reduces Math validation loss from 1.20 to 1.12
Scaling: 2x iterations matches performance of 3x expert selections, outperforming layer scaling
Efficiency: 17.6-42% lower memory usage with equivalent performance
Flexibility:823x increase in expert combinations, improving utilization, communication, and specialization

These advantages constitute a "free lunch" effect, enabling efficient scaling of LLMs.

English Blog:

Please, Log in or Register to view URLs content!

Chinese Blog:

Please, Log in or Register to view URLs content!

Legume7 · Mar 4, 2025

tokenanalyst said:
Chain-of-Experts: Unlocking the Communication Power of MoEs

Introduction

We propose Chain-of-Experts (CoE), which fundamentally changes sparse Large Language Model (LLM) processing by implementing sequential communication between intra-layer experts within Mixture-of-Experts (MoE) models.

Mixture-of-Experts (MoE) models process information independently in parallel between experts and have high memory requirements. CoE introduces an iterative mechanism enabling experts to "communicate" by processing tokens on top of outputs from other experts.

Experiments show that CoE significantly outperforms previous MoE models in multiple aspects:

Performance: CoE with 2x iterations reduces Math validation loss from 1.20 to 1.12

Scaling: 2x iterations matches performance of 3x expert selections, outperforming layer scaling

Efficiency: 17.6-42% lower memory usage with equivalent performance

Flexibility:823x increase in expert combinations, improving utilization, communication, and specialization

These advantages constitute a "free lunch" effect, enabling efficient scaling of LLMs.

English Blog:
Please, Log in or Register to view URLs content!

Chinese Blog:
Please, Log in or Register to view URLs content!

View attachment 146944

This was made by former DeepSeek intern (now PhD student at Northwestern) Wang Zihan.

tphuang · Mar 4, 2025

Huawei's anticipation for token generation out of China

https://twitter.com/i/web/status/1897063692925362671

Artificial Intelligence thread

Hyper

Junior Member

tphuang

General

tphuang

General

tphuang

General

tphuang

General

tphuang

General

tokenanalyst

Brigadier

tokenanalyst

Brigadier

Chain-of-Experts: Unlocking the Communication Power of MoEs

Legume7

New Member

Chain-of-Experts: Unlocking the Communication Power of MoEs

View attachment 146944

tphuang

General

Artificial Intelligence thread

Junior Member

General

General

General

General

General

Brigadier

Brigadier

Chain-of-Experts: Unlocking the Communication Power of MoEs​

​

New Member

Chain-of-Experts: Unlocking the Communication Power of MoEs​

View attachment 146944​

General

Chain-of-Experts: Unlocking the Communication Power of MoEs

Chain-of-Experts: Unlocking the Communication Power of MoEs

View attachment 146944