Artificial Intelligence thread

nativechicken

New Member
Registered Member
Actually I have read some arguments about this: chinese maybe a better language to train LLM than english.

We all know that LLM works on word tokens, but English itself is a creole phonetic language and basically have all kinds of loan worlds from other languages even in some very basic layer, so it maybe cause people, and also LLM to confuse. For example, how can you tell the relationship between pork and pig and hog? Moreover, tokenize english word, we can only get some random combination of letters. For example, we tokenize "lighter", get "ligh" and "ter", it does not have any meaning. If we spit it to "light" and "er", it has meaning, but from word itself, we don't see the direct relationship between lighter and fire.
But chinese is inherent tokenized, by character itself. "打火机” can be tokenized as “打”“火”“机”, we can, and LLM also can see the inherent logic relationship between "lighter" and "fire". It is hard to say it might be an huge advantage, but it doesnot suprise me if it is.
When I was young (about 30 years ago), certain circles in China advocated that Chinese characters were more conducive to scientific research and technological development than English (Latin alphabet). With a core set of 6,000-8,000 commonly used characters that can be flexibly combined to create modern technical terminology, Chinese enables intuitive comprehension of technical concepts through literal interpretation without requiring specialized jargon acquisition. In contrast, English academic literature relies heavily on abbreviations that demand prolonged specialized study to decipher, thereby elevating professional barriers. Chinese materials present minimal obstacles for cross-disciplinary learning - I can effortlessly navigate and comprehend various specialized documents within Chinese literature databases.

Many assessments of China's technological capabilities overlook a critical factor: decades of sustained literature digitization and systematic translation/reorganization of English scientific works into Chinese educational resources. This infrastructure enables most Chinese researchers to learn and conceptualize in their native language, bypassing the cognitive burden of English interpretation. (While Shakespeare's era considered 20,000-30,000 English words as linguistic mastery, modern English vocabulary has ballooned to millions, making cross-disciplinary learning particularly challenging. Chinese learners face concentrated difficulty in elementary literacy acquisition, but by high school achieve seamless cross-disciplinary comprehension across STEM fields - a stark contrast to Western educational trajectories.)

In my analysis, China stands as the world's second nation to establish a comprehensive independent scientific literature ecosystem. This system constitutes both a formidable competitive advantage and a strategic moat. The full magnitude of this advantage remains underappreciated today, likely requiring 20-30 years and a generation of bilingual researchers to articulate convincingly to non-Chinese audiences.

My recent experience with Deepseek reveals fascinating linguistic dimensions. Its Chinese Q&A examples circulating online demonstrate astonishing linguistic sophistication. While users like myself employ various AI tools (ChatGPT included), there's growing recognition that Western observers miss Deepseek's capabilities in classical Chinese composition and poetry - artistic expressions that reveal Chinese linguistic richness (which makes English expression appear comparatively impoverished through Chinese cultural lenses). Few recognize that Deepseek's Chinese performance might surpass its English capabilities, a testament to its fundamentally Chinese cognitive architecture. This linguistic foundation - not mere technical superiority - may constitute Deepseek's true strategic moat in open-sourcing its technology globally.
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member

Nvidia revenue by countries over past few quarters.

I wonder how Singapore suddenly went from nothing last year to $17.4B in just Q4. Wow, amazing. there is more chip demand for Nvidia from Singapore & Taiwan than Mainland China.

What's going on here?

Anyone wants to guess how many H100 GPUs can be bought for $37B?
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member

Nvidia revenue by countries over past few quarters.

I wonder how Singapore suddenly went from nothing last year to $17.4B in just Q4. Wow, amazing. there is more chip demand for Nvidia from Singapore & Taiwan than Mainland China.

What's going on here?

Anyone wants to guess how many H100 GPUs can be bought for $37B?
actually, I had to delete that, because it was not accurate.

This is something else I found


Keep in mind that a good chunk of this is from Western companies building data centers in ASEAN region and their corporate office is in Singapore.

But there is no question a chunk of that is smuggled into China. And same with chips that landed in Taiwan, Korea and Japan.
 

siegecrossbow

General
Staff member
Super Moderator

India lauds Chinese AI lab DeepSeek, plans to host its models on local servers​

Please, Log in or Register to view URLs content!
This is the true genius of making it open source. It enables countries who are not at the forefront of AI to have a fighting chance. The amount of geopolitical good will from this cannot be understated.
 

nativechicken

New Member
Registered Member

Nvidia revenue by countries over past few quarters.

I wonder how Singapore suddenly went from nothing last year to $17.4B in just Q4. Wow, amazing. there is more chip demand for Nvidia from Singapore & Taiwan than Mainland China.

What's going on here?

Anyone wants to guess how many H100 GPUs can be bought for $37B?
I have friends who work at a well-known domestic IDC... For some reasons, I happen to know some operational details (I won't disclose the details).
I can only say that the sources definitely won't be limited to Singapore alone. The A100 chips are not in short supply in China. People are still stockpiling them, grabbing as many as they can. Not just the A100, but also other NVIDIA chips that can be used for computing power, such as the H100 and the 4090, are being hoarded. Some people are even hacking the hardware to modify the chips, for example, expanding the memory of the 4090 to 48 GB. In mainland China, chips like the 2090, 3090, and 4090 have all been modified. If you really have the goods, you can just mark up the price and conduct a cash transaction.
The U.S. government is also aware of this situation, which is why the U.S. Department of Commerce has imposed a series of restrictions and created three different tiers for selling these chips.
Regardless of whether there is DeepSeek or not, China's construction of its computing power infrastructure will not be halted for a single step. The amount of funding is actually staggering (many state-owned enterprises that don't even understand much about this field are building computing power data centers, choosing locations based on where the electricity is cheaper, and there are specific targets that need to be met—only areas with electricity prices below a certain level are eligible for investment in construction; I won't go into the specific targets). Right now, it's not the construction of computing power that's lacking, but rather the development of AI applications that can truly generate revenue.
 
Top