Artificial Intelligence thread

OptimusLion

New Member
Registered Member
Another excellent domestic open source AI video generation model: Step-Video-T2V. The model has 30 billion parameters and can generate high-quality videos with a maximum length of 204 frames. In order to improve computing efficiency and picture quality, the development team specially designed a deep compressed variational autoencoder (Video-VAE), which can compress 16 times in space and 8 times in time, while still maintaining excellent video reconstruction effects.

Step-Video supports Chinese and English input, accurately parses the user's text description through a bilingual text encoder, and uses the 3D Full Attention DiT architecture for training, and uses the Flow Matching method for denoising to generate clear and natural pictures. In addition, the development team also introduced video-based direct preference optimization (Video-DPO) to further optimize video quality through human feedback, reduce artifacts, and make the picture smoother and more realistic.

 
Last edited:

sanctionsevader

New Member
Registered Member
Please, Log in or Register to view URLs content!
seems to be another banger.

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention​

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs—without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning.
 

OptimusLion

New Member
Registered Member
DeepSeek just published a new paper! (The paper was co-published with Peking University and Liang Wenfeng is also a co-author of the paper)

This paper "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention" introduces the native sparse attention (NSA) technology. NSA technology allows the model to no longer need to pay attention to all information when calculating attention, but only focus on the most important parts. This greatly reduces the amount of calculation and speeds up the speed of attention calculation


NSA technology achieves a double improvement in efficiency and accuracy in performance. In terms of efficiency, for 64k long text sequences, NSA achieves up to 11.6 times acceleration in decoding speed, and forward propagation and back propagation are also accelerated by 9.0 times and 6.0 times respectively, and the acceleration multiples are more significant as the sequence length increases. In terms of accuracy, the NSA pre-trained model performs on par with or even slightly better than the full attention model in general benchmark tests, and significantly outperforms the full attention model and other sparse attention methods in long text tasks and reasoning ability evaluation. For example, the average score in the LongBench comprehensive evaluation exceeds the full attention model by 0.032, and the performance improvement is more obvious in multi-hop question-answering tasks that require complex reasoning. This means that NSA has greatly improved computing efficiency while ensuring the excellent performance of the model, and even enhanced it in specific tasks.

82c654dfly1hyorscvf16j21r00xk154.jpg


82c654dfly1hyorsi02mhj218s0psn6w.jpg

82c654dfly1hyorsn9py0j21o80su7ru.jpg

Paper: arxiv.org/pdf/2502.11089
 

Attachments

  • 82c654dfly1hyorsi02mhj218s0psn6w.jpg
    82c654dfly1hyorsi02mhj218s0psn6w.jpg
    360.7 KB · Views: 5

9dashline

Captain
Registered Member
DeepSeek just published a new paper! (The paper was co-published with Peking University and Liang Wenfeng is also a co-author of the paper)

This paper "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention" introduces the native sparse attention (NSA) technology. NSA technology allows the model to no longer need to pay attention to all information when calculating attention, but only focus on the most important parts. This greatly reduces the amount of calculation and speeds up the speed of attention calculation


NSA technology achieves a double improvement in efficiency and accuracy in performance. In terms of efficiency, for 64k long text sequences, NSA achieves up to 11.6 times acceleration in decoding speed, and forward propagation and back propagation are also accelerated by 9.0 times and 6.0 times respectively, and the acceleration multiples are more significant as the sequence length increases. In terms of accuracy, the NSA pre-trained model performs on par with or even slightly better than the full attention model in general benchmark tests, and significantly outperforms the full attention model and other sparse attention methods in long text tasks and reasoning ability evaluation. For example, the average score in the LongBench comprehensive evaluation exceeds the full attention model by 0.032, and the performance improvement is more obvious in multi-hop question-answering tasks that require complex reasoning. This means that NSA has greatly improved computing efficiency while ensuring the excellent performance of the model, and even enhanced it in specific tasks.

View attachment 145858


View attachment 145860

View attachment 145861

Paper: arxiv.org/pdf/2502.11089
This speedup seems to be for training and inference

Soon, a consumer GPU will be more valuable than most lawyers, doctors, programmers and PhDs
 

luminary

Senior Member
Registered Member
Please, Log in or Register to view URLs content!

Grok3 came out. Seems to be around an o1-pro model, slightly better then r1, o3 mini, and o1. Elon says they trained on 200k H100 GPUs. Honestly kind of a letdown for the number of GPUs they had access to.
Crap architecture. No amount of GPUs will get around bad software. Elon will fire somebody, then model will improve slowly.

Seeing the explosive growth of cursor.ai and Trae IDE doing jack to slow it despite being free (but inferior product and timid AI). ByteDance definitely has development issues. Did one intern really do that much damage?

China is going integrate a AI chatbot into things like TV and cars. It doesn't make sense this whole thing is turning into a mania.
They're releasing products fast and seeing what sticks before investing real time into it. I hope the customers get options to opt out if they don't like the jankness.
 
Last edited:
Top