Artificial Intelligence thread

Legume7

Just Hatched
Registered Member
DeepSeek releases new model: Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
teaser_januspro.png


Please, Log in or Register to view URLs content!
 

siegecrossbow

General
Staff member
Super Moderator
DeepSeek releases new model: Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
View attachment 144495


Please, Log in or Register to view URLs content!
1737995002395.jpeg
 

Moonscape

Junior Member
Registered Member
DeepSeek releases new model: Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
View attachment 144495


Please, Log in or Register to view URLs content!

A second model has hit the techbros
 

gpt

Junior Member
Registered Member
1738009848218.png

Nvidia has released a statement today:

"DeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling," Nvidia stated. "DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant.” DeepSeek's progress has been met with both admiration and apprehension. Its V3 model reportedly achieves performance comparable to OpenAI’s GPT-4 while using just 5% of the GPU compute, and its R-1 model operates at 1/13th of the cost of GPT-4.

These milestones, Nvidia noted, underscore the ingenuity spurred by resource constraints. Nvidia stressed that the firm's breakthroughs still rely on its hardware. “Inference requires significant numbers of Nvidia GPUs and high-performance networking,” the company said. “We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.”
 

antwerpery

Junior Member
Registered Member
Please, Log in or Register to view URLs content!

If all this is true, the CHIPS Act, which was designed to slow China in the AI race, may turn out to be one of the worst backfires in history. (I tried to warn a number of people in the Biden administration about this possibility in the summer of 2023; instead of listening, they recently doubled down, in one of Biden’s final executive orders.)

The most obvious worry was that the CHIPS Act would encourage China to build its own chips. People in the White House indeed foresaw that, and China has in fact already invested many billions towards that end. Still, many in Washington appeared to see the export controls as an urgently needed delaying tactic, perhaps guessing that it would buy the US a few critical years and somehow secure a permanent advantage.

I never bought this argument because I figured that getting to GPT-5 first might buy the winner better boilerplate text writing, but not military genius. Getting there first just wasn’t going to matter in the long run, any more than a US company to getting to GPT-4-level GenAI did, in the grand scheme of things; as we saw this week, the advantage was fleeting. Nonetheless, the Biden administration, perhaps caught in the hype, seemed willing to gamble an awful lot for a short-term advantage, even if it meant straining relations with Beijing or spurring China’s own future innovation in silicon manufacturing. (Trump’s boosting of Stargate seemingly fits with the same magical thinking in LLMs, premised on the same hope of achieving a supremacy that may never come.)

Instead, as we have seen in recent weeks, Silicon Valley’s initial advantage in LLMs evaporated quickly, despite export controls.
But not (as some of us thought) because China ramped up H100 equivalents quickly (a big multi-year job, far from complete), but because they figured out how to work around them.

We accidentally upped their technical game. In the FT, Angela Zhang argued, “China’s achievements in efficiency are no accident. They directly respond to the escalating export restrictions imposed by the US and its allies. By limiting China’s access to advanced AI chips, the US has inadvertently spurred its innovation.”


And maybe kneecapped our greatest silicon company, Nvidia. In exchange for very little, aside from a brief stock bump for Nvidia (which prospered for a while when too many people bet wrongly that the answer to AI lay in their premium chips).

When will America will figure that sanctions don't work vs China. Honestly the sanctions worked out very well for China.
 
Top