Sidetracking a bit here... One major reason is that RISC-V is not a very well-designed ISA.
The ISA is well designed for what it was intended to do. To be used by college hardware design classes on how to design modern CPUs. At UC Berkeley.
Some of the design choices are questionable at best. Qualcomm, Alibaba T-Head and even Huawei have their own modified versions of RISC-V. If you follow RISC-V news you can find that Qualcomm publicly proposed major fixes for RISC-V but got rejected.
We joked about scaling all the way from embedded to server space when we originally delineated the ISA.
So you can go from RV32E all the way to RV128.
Little did we know how far the ISA would come. The biggest mistakes IMHO on the specification with regards to the server space are the lack of indexed memory addressing and performance with unaligned memory accesses. I did put those as desirable features on the initial specification, but the college students torpedoed that because it would have made their task of designing a CPU in a semester harder. So much for that.
I do think T-Head has some proprietary extensions which do add indexed memory addressing.
As for Qualcomm, much of what they have claimed RISC-V should add or remove, is bullshit and a bad idea. They are just being lazy and trying to turn RISC-V into ARM. For example the variable length instructions in RISC-V. They are actually a great idea to compress code size so you need less i-cache and have less faults. I usually joke that RISC-V is the most CISCy RISC processor.
And it is not like there is anything preventing you from supporting unaligned memory accesses in your own hardware implementation. It is just that it isn't mandatory as part of the spec.
As for the lack of flags you can blame me.
I demanded that to be part of the architecture to make it easier to make OoO superscalar processors.
Another thing it could use, I guess, is a proper multiple register save and restore instruction. While we did consider it, back then I didn't know about the ARM instructions, I had never coded in ARM assembler myself, and the x86 register save instructions are kind of useless anyway.
On hindsight I am not even sure if ARM had those instructions back when we came up with the initial draft RISC-V spec.
RISC-V is good enough mostly for embedded cores but a somewhat flawed ISA for high-performance computing. Working with RISC-V for an extended period of time usually makes one appreciate how well-designed ARM AArch64 is. Unless either Qualcomm or Huawei is completely cut off from ARM so that they are forced to move on to an improved version of RISC-V, we will probably not see any serious efforts in RISC-V outside embedded markets.
You can get RISC-V to work for high-performance computing even with the regular ISA. You will need a wider processor because you will have more ops because of the lack of indexed memory addressing and other things. But it is not like it is impossible to do one.
Good evidence for this is that people like Jim Keller have looked at the architecture and think there is nothing majorly wrong with it.
The ISA is also so extensible that if something does become a major issue it can be added later. The latest RVA23 spec is way more complete than the initial RV64GC.
RISC-V is probably the only current architecture designed from the outset to be able to support hypervisors in the future. And it also has more features that other architectures lack despite being that simple. It was designed to be extensible from the start. With the variable length instructions you get even more ISA space without bloating the code size.
A lot of people claim that the latest RISC-V profile with the builtin instruction compression always provides the smallest binaries of the major architectures. Even compared with CISC x86-64. So that is something. A lot of people overly focus on reducing the amount of instructions instead of code size. But they forget that while logic continues to shrink, memory (particularly SRAM) is shrinking way slower. It is important to keep the memory footprint down. If you keep going outside of the cache size and need to fetch memory from DRAM you will also slow everything down.