Indeed I realized the usual way is not the right approach for GPU dedicated at AI.
When there are thousand of identical very small core units in a single GPU, as is the case with modern NVIDIA, AMD, Huawei, etc, the actual yield is different.
The point is,
manufacturers are able to disable defective cores on the GPU die at packaging time. One extreme case is the Cerebras Wafer Scale unit that is as big as the whole wafer!
is shown that Cerebras's
yield is always 100% (!)
This is possible because at design time some spare cores / links are reserved and are later used instead of the defective ones that are in some way isolated and disabled (I've read AMD uses electrical fuses to do this).
So let's try another approach!
Under current lithography standards, max die size is
and is due to standard photomask of 104mm by 132mm with a 4x reduction when projected on the wafer.
So assume that our hypothetical
"Max AI" chip is 26x33 = 858mm2 (if smaller, the number of dies per wafer will be higher so this is a worst case).
Now assume manufacturer can almost always "fix" the chip by disabling defected cores or by binning to lower spec. Assume only in very few cases, where the defects are in very critical no-redundancy positions on the die, we can't recover and have to scrap it. So let's assume a
95% yield (remember Cerebras is 100% using full wafer).
Under these extreme conditions we get 62 dies x 95% yield =
50 dies per wafer
It means that
1M "Max AI" chips would require 20K wafers
This is the worst case and is probably near to where NVIDIA A100 is.