Tsinghua team open source new large model inference engine that supports FP8 on all the non-hopper Nvidia GPUs and domestic GPUs. So you can do FP8 inference on all the Nvidia and domestic GPUs
scalable solution from pure CPU, to single GPU to large cluster.
When using A800 in testing vs international open source framework, it showed 3.15x improvement in inference speed & utilizing 50% less resources