Moore Threads completes the full-link engineering adaptation of DeepSeek-V4: S5000 enables rapid deployment of complex MoE models based on MUSA+SGLang
Recently, Moore Threads, relying on the flagship AI training and push integrated intelligent computing card MTT S5000 and its self-developed MUSA software stack, successfully completed the full operation verification of DeepSeek-V4 based on the SGLang open-source inference framework . This achievement demonstrates that, for the next generation of large MoE models, Moore Threads has built a systematic adaptation link from the core computing engine of the hardware architecture and support for hotspot operators to end-to-end deployment verification , verifying the carrying capacity and engineering implementation capability of domestic GPU platforms for cutting-edge large models with "framework-level compatibility and out-of-the-box deployment".
As large-scale model architectures continue to evolve, advanced models such as DeepSeek-V4 place stringent demands on underlying accuracy, operator coverage, compilation optimization, parallel communication, and inference efficiency. Moore Threads fully leverages the native FP8 computing power of the S5000 , the deep CUDA compatibility of MUSA, and the perfect support of the TileLang MUSA compiler for the TileLang ecosystem. Combined with the reuse of the TileKernels open-source library and the rapid development of custom operators based on TileLang, Moore Threads quickly established a seamless DeepSeek-V4 inference adaptation pipeline . This further validates Moore Threads' ability to provide developers and industry users with an efficient and easily deployable domestic hardware and software foundation for running large-scale models.
It is worth noting that TileLang-MUSA has officially been integrated into the TileLang mainline, achieving seamless Day-0 support for the latest DeepSeek-V4 release of the TileLang operator library, TileKernels. This means that the MUSA platform has the engineering foundation to support the cutting-edge LLM operator ecosystem, providing a directly reusable operator pathway for subsequent adaptation to advanced open-source models.





