SiliconCloud which uses Huawei Cloud's Ascend Cloud technology have rolled out support for V3 and R1 on its platform.
If you look at their page, it supports basically all the major Chinese open source LLM. So there is no issue with Ascend being used for inference.
This source claims inference in around 38 t/s, which is not bad: