Artificial Intelligence thread

Legume7 · Jan 27, 2025

DeepSeek releases new model: Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.

Please, Log in or Register to view URLs content!

siegecrossbow · Jan 27, 2025

Legume7 said:
DeepSeek releases new model: Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
View attachment 144495

Please, Log in or Register to view URLs content!

Moonscape · Jan 27, 2025

Legume7 said:
DeepSeek releases new model: Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
View attachment 144495

Please, Log in or Register to view URLs content!

A second model has hit the techbros

StraightEdge · Jan 27, 2025

Wow, now apparently cyberattacks on DeepSeek. Multiple outlets are reporting -
I can only see desperation from US, the battle has been lost for them.

Please, Log in or Register to view URLs content!

StraightEdge · Jan 27, 2025

It's just unrelenting pressure from the Chinese!! Qwen2.5-VL dropped.

https://twitter.com/i/web/status/1883954247743725963

henrik · Jan 27, 2025

China is now using more affordable local AI chips for these AI modelling.

gpt · Jan 27, 2025

Nvidia has released a statement today:

"DeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling," Nvidia stated. "DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant.” DeepSeek's progress has been met with both admiration and apprehension. Its V3 model reportedly achieves performance comparable to OpenAI’s GPT-4 while using just 5% of the GPU compute, and its R-1 model operates at 1/13th of the cost of GPT-4.

These milestones, Nvidia noted, underscore the ingenuity spurred by resource constraints. Nvidia stressed that the firm's breakthroughs still rely on its hardware. “Inference requires significant numbers of Nvidia GPUs and high-performance networking,” the company said. “We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.”

GulfLander · Jan 27, 2025

"DeepSeek hit with ‘large-scale’ cyber-attack after AI chatbot tops app stores

Attack forces Chinese company to temporarily limit registrations as app becomes highest rated free app in US"

Please, Log in or Register to view URLs content!

antwerpery · Jan 27, 2025

Please, Log in or Register to view URLs content!

If all this is true, the CHIPS Act, which was designed to slow China in the AI race, may turn out to be one of the worst backfires in history. (I tried to warn a number of people in the Biden administration about this possibility in the summer of 2023; instead of listening, they recently doubled down, in one of Biden’s final executive orders.)

The most obvious worry was that the CHIPS Act would encourage China to build its own chips. People in the White House indeed foresaw that, and China has in fact already invested many billions towards that end. Still, many in Washington appeared to see the export controls as an urgently needed delaying tactic, perhaps guessing that it would buy the US a few critical years and somehow secure a permanent advantage.

I never bought this argument because I figured that getting to GPT-5 first might buy the winner better boilerplate text writing, but not military genius. Getting there first just wasn’t going to matter in the long run, any more than a US company to getting to GPT-4-level GenAI did, in the grand scheme of things; as we saw this week, the advantage was fleeting. Nonetheless, the Biden administration, perhaps caught in the hype, seemed willing to gamble an awful lot for a short-term advantage, even if it meant straining relations with Beijing or spurring China’s own future innovation in silicon manufacturing. (Trump’s boosting of Stargate seemingly fits with the same magical thinking in LLMs, premised on the same hope of achieving a supremacy that may never come.)

Instead, as we have seen in recent weeks, Silicon Valley’s initial advantage in LLMs evaporated quickly, despite export controls.
But not (as some of us thought) because China ramped up H100 equivalents quickly (a big multi-year job, far from complete), but because they figured out how to work around them.

We accidentally upped their technical game. In the FT, Angela Zhang argued, “China’s achievements in efficiency are no accident. They directly respond to the escalating export restrictions imposed by the US and its allies. By limiting China’s access to advanced AI chips, the US has inadvertently spurred its innovation.”

And maybe kneecapped our greatest silicon company, Nvidia. In exchange for very little, aside from a brief stock bump for Nvidia (which prospered for a while when too many people bet wrongly that the answer to AI lay in their premium chips).

When will America will figure that sanctions don't work vs China. Honestly the sanctions worked out very well for China.

tphuang · Jan 27, 2025

https://twitter.com/i/web/status/1884006395193840043

4 major image/vision language models recently, including 3 in the past couple of days out of China. All before new year. Sparks the question of how China is able to cultivate such a robust domestic AI industry. or at least it should

Artificial Intelligence thread

Legume7

New Member

siegecrossbow

Field Marshall

Moonscape

Junior Member

StraightEdge

Junior Member

StraightEdge

Junior Member

henrik

Senior Member

gpt

Junior Member

GulfLander

Brigadier

antwerpery

Junior Member

tphuang

General