Artificial Intelligence thread

Coalescence · Sep 22, 2024

PCK11800 said:
Currently even the latest and greatest LLMs (looking at you ClosedAI o1) sometimes crashes and burns even with simple standalone React components. They are completely and utterly useless with writing code for my legacy backend codebase. Unless I can fit the entire codebase into the context window, no LLMs can contribute without me spoonfeeding them all the relevant context... which is like 90% of the work.

I agree with this very much, especially after testing the o1-preview model and finally hitting the token limit. Before laying out the problems I still have with the newest model, the codebase I'm working with is written in Vue.js 2, and uses Element UI and IView component library. The problem that I have with the model are:
1. It keeps using syntax and methods that are only found in Vue.js 3, when I've already specified and reminded the code is in Vue.js 2. I had to tell it to use the right methods and functions for it to use in order to work.
2. After modifying the code and wanting it to add or do changes on the new code, I would give it the modified code and use that for its generation. Sometimes it revert the changes on the new code, other times it just iterates on the old code.
3. As I mentioned before, it have a tendency to change parts of the code that doesn't relate to the request. This is still a problem in o1-preview, and is very annoying having to figure out what went wrong, and copy pasting back the working portions of the old code.

The newest model is definitely smarter than before, but it having the same problem as the previous model have above but now with worse speed, I would rather stick with the old model and just iterate the solutions it manually. Also have you guys noticed you can't provide file attachments to o1-preview and o1-mini?

tphuang · Sep 22, 2024

I put together a thread of all the ways that Alibaba said AI can be used or are already been used in people's lives

https://twitter.com/i/web/status/1837862996342653188

tphuang · Sep 24, 2024

Please, Log in or Register to view URLs content!

Sensetime is using Ascend ecosystem to train large models (as large as 102B parameter)

tphuang · Sep 26, 2024

Please, Log in or Register to view URLs content!

More on Sensetime using Domestic software/hardware, ascend chips for medical industry AI development. Trained using 30B tokens

siegecrossbow · Sep 27, 2024

Kling already used for geopolitical memes.

Please, Log in or Register to view URLs content!

tphuang · Sep 30, 2024

https://twitter.com/i/web/status/1840886680816869593

I'm impressed by how much Hisense talked about AI and large model in its product release. The consumer industry is incorporating more and more AI.

tphuang · Oct 1, 2024

For everyone at home, inference requirements will go up significantly as more multi-step models like o1 get used around the world

https://twitter.com/i/web/status/1841092713644454081

9dashline · Oct 1, 2024

Running large language models like Facebook’s LLaMA3 (405 billion parameters) already requires serious hardware, like an 8-way NVIDIA H100 cluster. But it’s not just about throwing more GPUs at the problem—it’s about how these models are utilized during inference. Techniques like Chain of Thought (CoT) don’t necessarily need more GPUs; they need more time.

Think of it like AlphaGo’s Monte Carlo simulations: once AlphaGo’s neural network made its move predictions, the ELO performance improved by running multiple playouts to refine those predictions further. CoT is similar—during inference, the model "questions itself" and iterates on its reasoning, not just outputting a single pass response but evaluating and refining multiple chains internally. So while the hardware footprint stays roughly the same, inference time increases due to these deeper, more sophisticated reasoning steps.

But this shift towards CoT is also a subtle admission that brute-force scaling of models—just piling on more and more parameters—is starting to show its limitations. We’re moving away from simply making models bigger to making them “think” more effectively. It’s not unlike the early days of CPUs: at first, it was all about more gigahertz until that ran into thermal constraints. Then it became about multi-core designs and optimizing software to take advantage of parallelism. Eventually, GPUs became the new frontier. Now, with LLMs, instead of just scaling up parameter counts endlessly, techniques like CoT represent a new way to push the boundaries without hitting the same brick walls of diminishing returns.

Looking ahead, as rumors suggest models will soon hit the 3-5 trillion parameter mark, we’re likely going to see even more emphasis on complex inference processes rather than just parameter inflation. It’s not just about raw power anymore but smarter utilization, requiring better memory management and model partitioning. CoT is just one example of optimizing inference to continue scaling, not in size, but in capability and sophistication.

tphuang · Oct 1, 2024

9dashline said:
Running large language models like Facebook’s LLaMA3 (405 billion parameters) already requires serious hardware, like an 8-way NVIDIA H100 cluster. But it’s not just about throwing more GPUs at the problem—it’s about how these models are utilized during inference. Techniques like Chain of Thought (CoT) don’t necessarily need more GPUs; they need more time.

Think of it like AlphaGo’s Monte Carlo simulations: once AlphaGo’s neural network made its move predictions, the ELO performance improved by running multiple playouts to refine those predictions further. CoT is similar—during inference, the model "questions itself" and iterates on its reasoning, not just outputting a single pass response but evaluating and refining multiple chains internally. So while the hardware footprint stays roughly the same, inference time increases due to these deeper, more sophisticated reasoning steps.

But this shift towards CoT is also a subtle admission that brute-force scaling of models—just piling on more and more parameters—is starting to show its limitations. We’re moving away from simply making models bigger to making them “think” more effectively. It’s not unlike the early days of CPUs: at first, it was all about more gigahertz until that ran into thermal constraints. Then it became about multi-core designs and optimizing software to take advantage of parallelism. Eventually, GPUs became the new frontier. Now, with LLMs, instead of just scaling up parameter counts endlessly, techniques like CoT represent a new way to push the boundaries without hitting the same brick walls of diminishing returns.

Looking ahead, as rumors suggest models will soon hit the 3-5 trillion parameter mark, we’re likely going to see even more emphasis on complex inference processes rather than just parameter inflation. It’s not just about raw power anymore but smarter utilization, requiring better memory management and model partitioning. CoT is just one example of optimizing inference to continue scaling, not in size, but in capability and sophistication.

If you need more time per request then that is time those GPUs can't be used to run inference on anything else.

As such, you need more GPUs to handle more requests.

diadact · Oct 1, 2024

tphuang said:
If you need more time per request then that is time those GPUs can't be used to run inference on anything else.

As such, you need more GPUs to handle more requests.

Right now o1 just outputs text once models start outputting videos & images the requirement for compute will go through the roof

Artificial Intelligence thread

Coalescence

Senior Member

tphuang

General

tphuang

General

tphuang

General

siegecrossbow

Field Marshall

tphuang

General

tphuang

General

9dashline

Captain

tphuang

General

diadact

New Member