Inspur has created 元脑 R1 推理服务器 inference machine for DeepSeek full version. Provides 1128 GB HBM3e storage, enough for FP8 precision inference. Video memory bandwidth of machine can reach 4.8TB/s
GPU P2P bandwidth reaches 900 GB/s
Single machine can support 20-30 concurrent users on latest inference framework.
It seems to me that they are probably using Nvidia H100 GPUs under the hood but don't want to mention it.
Last edited: