I've met him irl. Dylan comes off as a ML enthusiast that thinks he's qualified to talk about chips and GPUs just because he's got low-level connections in Taiwan and Nvidia. The only bigshots that pay any attention to him are VC and software numbskulls like Sam Altman or Elon who don't know the first thing about hardware.
The Kirin 9000S is the first ever mobile ARM SoC to implement SMT. Has anybody posted about SMT hyperthreading perf for the Kirin chip?
Really depends on the code.
Code written with a high cache miss rate and that is also highly threaded might get some extra performance out of it.
When a piece of code has a cache miss, because the data is not in cache. The CPU will need to retrieve the data from memory making the code wait 300~400 cycles(could be a order smallers like 30~40 cycles) before the data arrives and the CPU core can continue. A core with hyper threading or SMT will try to do something else in those 300~400 cycles, interleaving other pieces of code that can run because all the data is already in cache.
It will do nothing for the theoretical max performance of a core but it might make shitty optimised threaded code perform better.
But it has been a while since i did high performance coding, so i might be mistaken