Chinese semiconductor industry

tonyget · Jul 30, 2023

The problem Chinese domestic AI chip facing，is ecosystem/software compatibility/ease to use，thus the overall cost is high compare to Nvidia chip. That's why even though NVIDIA Chip is being pushed to sky high price，Chinese companies are still rush to buy these chips like crazy

Please, Log in or Register to view URLs content!

实际上，除了硬件性能差距外，软件生态也是国产AI芯片厂商的短板。

芯片需要适配硬件系统、工具链、编译器等多个层级，需要很强的适配性，否则会出现这款芯片在某个场景能跑出90%的算力，在另一场景只能跑出80%效能的情景。

上文提到，英伟达在这方面优势明显。早在2006年，英伟达就推出了计算平台CUDA，这是一个并行计算软件引擎，CUDA框架里集成了很多调用GPU算力所需的代码，工程师可以直接使用这些代码，无须一一编写。开发者可使用CUDA更高效地进行AI训练和推理，更好的发挥GPU算力。时至今日，CUDA已成为AI基础设施，主流的AI框架、库、工具都以CUDA为基础进行开发。

如果没有这套编码语言，软件工程师发挥硬件价值的难度会变得极大。

英伟达之外的GPU和AI芯片如要接入CUDA，需要自己提供适配软件。据业内人士透露，曾接触过一家非英伟达GPU厂商，尽管其芯片和服务报价比英伟达更低，也承诺提供更及时的服务，但使用其GPU的整体训练和开发成本会高于英伟达，还得承担结果和开发时间的不确定性。

虽然英伟达GPU价格贵，但实际用起来反而是最便宜的。这对有意抓住大模型机会的企业来说，钱往往不是问题，时间才是更宝贵的资源，大家都必须尽快获得足够多的先进算力来确保先发优势。

因此，对于国产芯片供应商来讲，哪怕能通过堆芯片的方式能堆出一个算力相当的产品，但软件适配与兼容让客户接受更难。此外，从服务器运营的角度，它的主板开销、电费、运营费，以及需要考虑的功耗、散热等问题，都会大大增加数据中心的运营成本。

因为算力资源常需要以池化的形式呈现，数据中心通常更愿意采用同一种芯片，或者同一家公司的芯片来降低算力池化难度。

算力的释放需要复杂的软硬件配合，才能将芯片的理论算力变为有效算力。对客户而言，把国产AI芯片用起来并不容易，更换云端AI芯片要承担一定的迁移成本和风险，除非新产品存在性能优势，或者能在某个维度上提供其他人解决不了的问题，否则客户更换的意愿很低。

In fact, in addition to the gap in hardware performance, the software ecosystem is also a shortcoming of domestic AI chip manufacturers.

The chip needs to adapt to multiple levels such as hardware system, tool chain, compiler, etc., and needs strong adaptability. Otherwise, it will appear that this chip can run 90% of the computing power in one scene, but only in another scene. Run out of 80% performance scenario.

As mentioned above, Nvidia has obvious advantages in this regard. As early as 2006, Nvidia launched the computing platform CUDA, which is a parallel computing software engine. The CUDA framework integrates a lot of codes required to invoke GPU computing power. Engineers can directly use these codes without writing them one by one. Developers can use CUDA to perform AI training and reasoning more efficiently, and make better use of GPU computing power. Today, CUDA has become an AI infrastructure, and mainstream AI frameworks, libraries, and tools are all developed based on CUDA.

Without this set of coding languages, it will be extremely difficult for software engineers to realize the value of hardware.

If GPUs and AI chips other than Nvidia want to access CUDA, they need to provide their own adaptation software. According to industry insiders, I have contacted a non-NVIDIA GPU manufacturer. Although its chip and service quotations are lower than NVIDIA’s and promise to provide more timely services, the overall training and development costs of using its GPU will be higher than NVIDIA’s. Undertake the uncertainty of results and development time.

Although Nvidia GPUs are expensive, they are actually the cheapest to use. For companies that intend to seize the opportunity of large-scale models, money is often not a problem, and time is a more precious resource. Everyone must obtain enough advanced computing power as soon as possible to ensure the first-mover advantage.

Therefore, for domestic chip suppliers, even if a product with comparable computing power can be stacked by stacking chips, it is more difficult for customers to accept software adaptation and compatibility. In addition, from the perspective of server operation, its motherboard expenses, electricity charges, operating expenses, and issues such as power consumption and heat dissipation that need to be considered will greatly increase the operating costs of the data center.

Because computing power resources often need to be presented in the form of pooling, data centers are usually more willing to use the same chip or chips from the same company to reduce the difficulty of computing power pooling.

The release of computing power requires complex software and hardware cooperation to turn the theoretical computing power of the chip into effective computing power. For customers, it is not easy to use domestic AI chips. Replacement of cloud AI chips requires certain migration costs and risks, unless the new product has performance advantages, or can provide problems that others cannot solve in a certain dimension. Otherwise, the willingness of customers to replace is very low.

daifo · Jul 30, 2023

The Linux kernal will soon be updated to support improve binary translation for loongson cpu with x86/arm/mips compiled binary.

Please, Log in or Register to view URLs content!

Pretty interesting page of a person playing emulation games on the Loongson

Please, Log in or Register to view URLs content!

gelgoog · Jul 30, 2023

Quickie said:
For the exact same design in terms of the circuitry and types of semiconductor nodes, node size does relate directly to the switching speed of the circuitry and in the end the clock speed.

A processor is only as fast as its slowest element. The Pentium 4 i.e. "Netburst" architecture achieved high clock speed by using a really long pipeline where each stage did less than a stage in other processors. So you can get high clock speeds that way as well. The problem is that processors with such long pipelines have larger penalties in cases where for whatever reason you need to flush the pipeline and restart. Like when you make a branch misprediction.

Schmoe · Jul 30, 2023

Regarding SMIC using its N+1 technology to manufacture chips for Huawei, that sounds like a clear violation of US sanctions and, to my knowledge, SMIC has so far hewed to sanctions limitations.

This is presumably related to pending ASML shipment restrictions, but I wonder if we can infer that SMIC believes that it can continue to operate ASML machines even if restrictions are tightened to include servicing. Manufacturing Huawei chips is almost asking for restrictions on servicing.

gelgoog · Jul 30, 2023

Schmoe said:
Regarding SMIC using its N+1 technology to manufacture chips for Huawei, that sounds like a clear violation of US sanctions and, to my knowledge, SMIC has so far hewed to sanctions limitations.

This is presumably related to pending ASML shipment restrictions, but I wonder if we can infer that SMIC believes that it can continue to operate ASML machines even if restrictions are tightened to include servicing. Manufacturing Huawei chips is almost asking for restrictions on servicing.

How do we know SMIC did not simply just spin out its FinFET capable factory in Shanghai?
Why do you think they took out 14nm from their website?
I have been thinking this might have happened for a while. I suspect it is highly likely.

daifo · Jul 30, 2023

For the person looking for a Chinese Notebook:

A Loongson 4 Core 3a5000 notebook runs for around 900+ usd on taobao. Considering its performance, its a optimisically msrp 300-400 computer. There is likely high markup due to low run, 2025 gov/buisness procurement deadlines, reselling profit, UOS, 3rd party graphix card?...but i wonder how much it cost SMIC(?) to produce a cpu and how much it sells for.

Please, Log in or Register to view URLs content!

Schmoe · Jul 30, 2023

gelgoog said:
How do we know SMIC did not simply just spin out its FinFET capable factory in Shanghai?
Why do you think they took out 14nm from their website?
I have been thinking this might have happened for a while. I suspect it is highly likely.

You might well be correct, but sanctions violations can avoid retaliation if they are not openly flouted and the US has a business interest to permit it (eg, the US now wants low oil prices, so Iran oil sanctions are weakly enforced). I cannot think of a more nuclear sanctions violation than assisting Huawei to get back into the high-end or mid-range phone business.

tphuang · Jul 30, 2023

Schmoe said:
Regarding SMIC using its N+1 technology to manufacture chips for Huawei, that sounds like a clear violation of US sanctions and, to my knowledge, SMIC has so far hewed to sanctions limitations.

think about before & after October sanctions. This really shouldn't be hard to figure out

Schmoe said:
This is presumably related to pending ASML shipment restrictions, but I wonder if we can infer that SMIC believes that it can continue to operate ASML machines even if restrictions are tightened to include servicing. Manufacturing Huawei chips is almost asking for restrictions on servicing.

there is already restrictions on servicing. How do you think SMIC has been operating its Finfet plan for the past 9 months? Lam & Amat already withdrew their staff from there.

The moment that America tried to destroy SMIC & YMTC is the moment these companies go 100% into working with Huawei

tonyget said:
The problem Chinese domestic AI chip facing，is ecosystem/software compatibility/ease to use，thus the overall cost is high compare to Nvidia chip. That's why even though NVIDIA Chip is being pushed to sky high price，Chinese companies are still rush to buy these chips like crazy

Please, Log in or Register to view URLs content!

实际上，除了硬件性能差距外，软件生态也是国产AI芯片厂商的短板。

芯片需要适配硬件系统、工具链、编译器等多个层级，需要很强的适配性，否则会出现这款芯片在某个场景能跑出90%的算力，在另一场景只能跑出80%效能的情景。

上文提到，英伟达在这方面优势明显。早在2006年，英伟达就推出了计算平台CUDA，这是一个并行计算软件引擎，CUDA框架里集成了很多调用GPU算力所需的代码，工程师可以直接使用这些代码，无须一一编写。开发者可使用CUDA更高效地进行AI训练和推理，更好的发挥GPU算力。时至今日，CUDA已成为AI基础设施，主流的AI框架、库、工具都以CUDA为基础进行开发。

如果没有这套编码语言，软件工程师发挥硬件价值的难度会变得极大。

英伟达之外的GPU和AI芯片如要接入CUDA，需要自己提供适配软件。据业内人士透露，曾接触过一家非英伟达GPU厂商，尽管其芯片和服务报价比英伟达更低，也承诺提供更及时的服务，但使用其GPU的整体训练和开发成本会高于英伟达，还得承担结果和开发时间的不确定性。

虽然英伟达GPU价格贵，但实际用起来反而是最便宜的。这对有意抓住大模型机会的企业来说，钱往往不是问题，时间才是更宝贵的资源，大家都必须尽快获得足够多的先进算力来确保先发优势。

因此，对于国产芯片供应商来讲，哪怕能通过堆芯片的方式能堆出一个算力相当的产品，但软件适配与兼容让客户接受更难。此外，从服务器运营的角度，它的主板开销、电费、运营费，以及需要考虑的功耗、散热等问题，都会大大增加数据中心的运营成本。

因为算力资源常需要以池化的形式呈现，数据中心通常更愿意采用同一种芯片，或者同一家公司的芯片来降低算力池化难度。

算力的释放需要复杂的软硬件配合，才能将芯片的理论算力变为有效算力。对客户而言，把国产AI芯片用起来并不容易，更换云端AI芯片要承担一定的迁移成本和风险，除非新产品存在性能优势，或者能在某个维度上提供其他人解决不了的问题，否则客户更换的意愿很低。

In fact, in addition to the gap in hardware performance, the software ecosystem is also a shortcoming of domestic AI chip manufacturers.

The chip needs to adapt to multiple levels such as hardware system, tool chain, compiler, etc., and needs strong adaptability. Otherwise, it will appear that this chip can run 90% of the computing power in one scene, but only in another scene. Run out of 80% performance scenario.

As mentioned above, Nvidia has obvious advantages in this regard. As early as 2006, Nvidia launched the computing platform CUDA, which is a parallel computing software engine. The CUDA framework integrates a lot of codes required to invoke GPU computing power. Engineers can directly use these codes without writing them one by one. Developers can use CUDA to perform AI training and reasoning more efficiently, and make better use of GPU computing power. Today, CUDA has become an AI infrastructure, and mainstream AI frameworks, libraries, and tools are all developed based on CUDA.

Without this set of coding languages, it will be extremely difficult for software engineers to realize the value of hardware.

If GPUs and AI chips other than Nvidia want to access CUDA, they need to provide their own adaptation software. According to industry insiders, I have contacted a non-NVIDIA GPU manufacturer. Although its chip and service quotations are lower than NVIDIA’s and promise to provide more timely services, the overall training and development costs of using its GPU will be higher than NVIDIA’s. Undertake the uncertainty of results and development time.

Although Nvidia GPUs are expensive, they are actually the cheapest to use. For companies that intend to seize the opportunity of large-scale models, money is often not a problem, and time is a more precious resource. Everyone must obtain enough advanced computing power as soon as possible to ensure the first-mover advantage.

Therefore, for domestic chip suppliers, even if a product with comparable computing power can be stacked by stacking chips, it is more difficult for customers to accept software adaptation and compatibility. In addition, from the perspective of server operation, its motherboard expenses, electricity charges, operating expenses, and issues such as power consumption and heat dissipation that need to be considered will greatly increase the operating costs of the data center.

Because computing power resources often need to be presented in the form of pooling, data centers are usually more willing to use the same chip or chips from the same company to reduce the difficulty of computing power pooling.

The release of computing power requires complex software and hardware cooperation to turn the theoretical computing power of the chip into effective computing power. For customers, it is not easy to use domestic AI chips. Replacement of cloud AI chips requires certain migration costs and risks, unless the new product has performance advantages, or can provide problems that others cannot solve in a certain dimension. Otherwise, the willingness of customers to replace is very low.

this is nonsense. China's most popular GPU series right now is ascend. Yes, the demand for Nvidia is very high. That's because China's AI chip demand 10x this year due to the GPT boom. There are domestic companies that have spent a lot of money building platforms on Cuda GPUs like Tencent & Bytedance & serveral others. It's very costly to move away from that. But for newer players or ones that have already worked with Huawei, Ascend platform is the best.

MortyandRick · Jul 30, 2023

Schmoe said:
You might well be correct, but sanctions violations can avoid retaliation if they are not openly flouted and the US has a business interest to permit it (eg, the US now wants low oil prices, so Iran oil sanctions are weakly enforced). I cannot think of a more nuclear sanctions violation than assisting Huawei to get back into the high-end or mid-range phone business.

So in your mind, why are they doing it? Maybe they sanctioned proof their production for that specific line.

Quickie · Jul 30, 2023

gelgoog said:
A processor is only as fast as its slowest element. The Pentium 4 i.e. "Netburst" architecture achieved high clock speed by using a really long pipeline where each stage did less than a stage in other processors. So you can get high clock speeds that way as well. The problem is that processors with such long pipelines have larger penalties in cases where for whatever reason you need to flush the pipeline and restart. Like when you make a branch misprediction.

That goes into the CPU architecture design in the way how it can also ultimately limit the clock speed.

I was referring to the switching speed of the semiconductor transistors themselves; how their node size affects their switching speed, power consumption etc.

Chinese semiconductor industry

tonyget

Senior Member

daifo

Major

gelgoog

Lieutenant General

Schmoe

New Member

gelgoog

Lieutenant General

daifo

Major

Schmoe

New Member

tphuang

General

MortyandRick

Senior Member

Quickie

Colonel