Artificial Intelligence thread

FairAndUnbiased

Brigadier
Registered Member
It's not the biggest Chinese LLM, there's WuDao with 1.75 trillion parameters. The Huawei's article highlights the data scarcity issue - it has only 329 billion tokens for almost 1.1T model, while GPT-3 with 175B parameters was trained on 499 billion tokens. I am curious about the training set size for WuDao.
if there is not enough data for all parameters, does that not mean that the 'problem' is underdefined? What happens to the unfilled parameters?
 

xypher

Senior Member
Registered Member
if there is not enough data for all parameters, does that not mean that the 'problem' is underdefined? What happens to the unfilled parameters?
Parameters will be updated with *something* even with a single batch of data so they are always filled. The issue is that such a model would be heavily underfitting the domain space - the amount of data would be too scarce to 'describe' the complex relations between the data and model would not be able to capture them, i.e. it would just output some garbage because the parameters would be close to the initial random distribution. This is the most obvious case but underfitting varies in severity - in case of PanGu-Sigma, the model clearly works but most likely it is not reaching its full potential because of limited data.

I should also note that underfitting can happen when the model itself is too 'simple' - lacking in capacity - to capture more complex interactions, which is why we usually see improvements with larger models and the general trend has been 'the larger - the better' (note: that does not mean that architecturally better but smaller model cannot outperform a larger one). That can also backfire as training for a long time on a small (relative to the model size) dataset can lead to overfitting, the model would simply memorize the training tokens and achieve near perfect metrics on the training set but fail to generalize on any new data. This issue, again, can vary in severity - it is pretty interesting with LLMs because I think it is hard to detect overfitting with them due to humongous data sizes as getting an out-of-domain testing set would be difficult.
 
Last edited:

tacoburger

Junior Member
Registered Member
Whats that again with China not being able to catch up with ChatGPT-2 ?

Thats Trillion with a T

Please, Log in or Register to view URLs content!
Hey I want China to succeed in this as much as everyone else here. I guess I'm glad that at least someone is taking LLM seriously. But I never said that they couldn't catch up. I have always said that the issue was that they couldn't even recognise the potential that GTP-2 or GTP-3 had. Just look at the amount of western LLMs that popped up after GTP-2/GTP-3, dozens of them, improving with every model release from GTP-2 and GTP-3, everyone was working on their own version, that or another generative A.I model like text to image. It took ChatGTP to kick the chinese A.I sector into the same high gear.

More time and attention on the general topic is always good. All the current issues that A.I have, training time, lack of data, lack of internal logic, etc etc, can be solved with enough time, attention and money, but only if people are even willing to do so.
 

bobsagget

New Member
Registered Member
I said I don't know much about software. I didn't mention chatGPT. if it can do that navigation then great. are you a SME in AI? do you work at Google or something?
My point being your requirements were odd and nonsensical for a specification of whats “impressive” for an ai . Anyways i provided proof that yes ai can “Fight” to achieve a desired outcome .
 

bobsagget

New Member
Registered Member
Parameters will be updated with *something* even with a single batch of data so they are always filled. The issue is that such a model would be heavily underfitting the domain space - the amount of data would be too scarce to 'describe' the complex relations between the data and model would not be able to capture them, i.e. it would just output some garbage because the parameters would be close to the initial random distribution. This is the most obvious case but underfitting varies in severity - in case of PanGu-Sigma, the model clearly works but most likely it is not reaching its full potential because of limited data.

I should also note that underfitting can happen when the model itself is too 'simple' - lacking in capacity - to capture more complex interactions, which is why we usually see improvements with larger models and the general trend has been 'the larger - the better' (note: that does not mean that architecturally better but smaller model cannot outperform a larger one). That can also backfire as training for a long time on a small (relative to the model size) dataset can lead to overfitting, the model would simply memorize the training tokens and achieve near perfect metrics on the training set but fail to generalize on any new data. This issue, again, can vary in severity - it is pretty interesting with LLMs because I think it is hard to detect overfitting with them due to humongous data sizes as getting an out-of-domain testing set would be difficult.
yeah overfitting has been an issue with some of the homebrew pyg ai.
but if you aren't a SME then we're both just randoms talking shit.
no im a slightly more read random shit. Look no one in ai would be impressed by a “I need energy” feature . Heck the us army built eater in 2009 and canned it as it was dubbed the corpse eating robot no man in the loop just a robot that eats .Like in all honesty chat gpts new power to use other software to complete a given task is insane. They are integrating design programs too like auto cad
Please, Log in or Register to view URLs content!

Please, Log in or Register to view URLs content!
 

Coalescence

Senior Member
Registered Member
Like in all honesty chat gpts new power to use other software to complete a given task is insane. They are integrating design programs too like auto cad
Please, Log in or Register to view URLs content!
This is the direction for AGI I wanted to see, granting the AI access and knowledge to use software tools to better solve the user's problem than hoping it'll unexpectedly develop the capability by dumping more data and training on it. Now they just need to prune the AI model so that it can run offline and be stored on consumer grade computers.
 

bobsagget

New Member
Registered Member
This is the direction for AGI I wanted to see, granting the AI access and knowledge to use software tools to better solve the user's problem than hoping it'll unexpectedly develop the capability by dumping more data and training on it. Now they just need to prune the AI model so that it can run offline and be stored on consumer grade computers.
Consumer grade is gonna be awhile pyg at 13 billion already eats 24gb of vram on the gpu unless you got a rig with a few thousand gb of ram you aint running that locally. I should add some degenerate made it interface with vr chat models https://www.reddit.com/r/PygmalionAI/comments/1141dwr
 
Top