Qwen officially launched the open source Qwen2.5-1M model and its corresponding reasoning framework support.
Tongyi Qianwen released two new open source models this time, namely Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M. This is the first time that Qwen has extended the context of the open source Qwen model to 1M in length.
In order to help developers deploy the Qwen2.5-1M series models more efficiently, the Qwen team has completely open-sourced the vLLM-based reasoning framework and integrated the sparse attention method, which makes the framework 3 to 7 times faster when processing 1M labeled inputs.
Key technology:
The training of long sequences requires a lot of computing resources, so a gradual length expansion method is adopted to expand the context length of Qwen2.5-1M from 4K to 256K in multiple stages: starting from an intermediate checkpoint of the pre-trained Qwen2.5, the context length is 4K at this time.
In the pre-training stage, the context length is gradually increased from 4K to 256K, and the Adjusted Base Frequency scheme is used to increase the RoPE base frequency from 10,000 to 10,000,000.
In the reinforcement learning stage, the model is trained on short texts (up to 8K tags). Through the above training, the final Instruct model can handle sequences up to 256K Tokens.
In the above training process, the context length of the model is only 256K Tokens. In order to expand it to 1M Tokens, the length extrapolation technology is used.
In terms of inference speed, in order to speed up the pre-filling stage, the research team introduced a sparse attention mechanism based on MInference.
In addition, several improvements are proposed: block pre-filling, integrated length extrapolation scheme, sparsity optimization, other optimizations, etc.
Deployment memory requirements:
Qwen2.5-7B-Instruct-1M: At least 120GB of video memory is required (total of multiple GPUs).
Qwen2.5-14B-Instruct-1M: At least 320GB of video memory is required (total of multiple GPUs).
Model link:
technical report:
experience link: