News on China's scientific and technological development.

HighGround

Senior Member
Registered Member
A lot of utility from "AI" is just from data analysis. "AI" can issue unique identification numbers to Objects (people, cars, whatever), analyze their usual patterns, and provide instances of significant deviations to users. This could be useful for schools, police, traffic management, and all kinds of businesses.

A lot of benefits from AI will probably be invisible to the public. Because those benefits will be in the form of altered business practice, facility practices, and changes to the legislature.
 

FairAndUnbiased

Brigadier
Registered Member
I keep seeing this data being limited by language argument being tossed around a lot, can anyone with a background in Machine Learning explain what's stopping Chinese companies from just translating the data/datasets in Chinese and correcting translation errors, or just train the model on English datasets and then translating the output?
I'm not an expert on any sort of software at all except maybe some simple industrial machines stuff, but I know something about Chinese linguistics. It is hard to translate between English and Chinese beyond basics because Chinese is a much more condensed language. Chinese has no gender, no inflection, no spacing to separate characters or words, high use of idioms to condense complex thoughts, and people often switch between vernacular and classical even on Weibo shitposts.

One of the funniest and stupidest translations I've seen was 来日方长 being machine translated as "came to Japan for a long time". I literally loled at that. To a computer though, how would it distinguish 成语 来日方长 from 来日 + [时间段]?
 

Andy1974

Senior Member
Registered Member
Quality is important but difficult to obtain at scale, especially with language models. The trick to models like ChatGPT is that they leverage the entire internet's "knowledge" to answer questions. If you tried to do that via manual labeling, it just doesn't work because it's the equivalent of trying to reproduce the entire internet's worth of knowledge.

It's like Google Search, in that regard. Baidu doesn't have a worse algorithm, necessarily; it just has a worse internet to work with.
If you use the entire internet to train your AI you will also be training it will all the misinformation on the internet. To have an AI that is well trained on internet data.. somebody has to choose which data to use.
 

FairAndUnbiased

Brigadier
Registered Member
If you use the entire internet to train your AI you will also be training it will all the misinformation on the internet. To have an AI that is well trained on internet data.. somebody has to choose which data to use.
Also isn't image recognition AI more important anyhow for military use ie training AI to find SAM sites or correlating wake with ship type?
 

Overbom

Brigadier
Registered Member
You don't need millions of annotators for chatgpt-like services. ChatGPT was developed with reinforcement learning involving both supervised (human) and unsupervised (by its own) learning.

The system uses machine learning algorithms to find patterns and connections in its training dataset by its own, without human involvement. It was first trained with human feedback and as soon as they had some initial results they had different ai instances to talk with each other. The AI at that time was trained by its own by using an internal reward system which rewarded the conversation it marked as high quality by its reward system. After running the same iteration thousands and millions of times with the AI picking the best, and incorporating traits from the best system each time, it reaches the final stage that it is in now.

The "million annotators" thing is now old news already and you would probably laughed out of an AI conference if you mentioned it seriously in 2023.
 

Andy1974

Senior Member
Registered Member
You don't need millions of annotators for chatgpt-like services. ChatGPT was developed with reinforcement learning involving both supervised (human) and unsupervised (by its own) learning.

The system uses machine learning algorithms to find patterns and connections in its training dataset by its own, without human involvement. It was first trained with human feedback and as soon as they had some initial results they had different ai instances to talk with each other. The AI at that time was trained by its own by using an internal reward system which rewarded the conversation it marked as high quality by its reward system. After running the same iteration thousands and millions of times with the AI picking the best, and incorporating traits from the best system each time, it reaches the final stage that it is in now.

The "million annotators" thing is now old news already and you would probably laughed out of an AI conference if you mentioned it seriously in 2023.
Well, given that unsupervised was developed as a result of lack of annotators.. you are dead wrong.
 

Overbom

Brigadier
Registered Member
Well, given that unsupervised was developed as a result of lack of annotators.. you are dead wrong.
Necessity is the mother of invention. And unsupervised learning, when all things considered, trumps supervised learning.

That's not to say that the human factor will disappear soon, but the trend is clear. Calls on AI, puts on humans
 

Andy1974

Senior Member
Registered Member
Necessity is the mother of invention. And unsupervised learning, when all things considered, trumps supervised learning.

That's not to say that the human factor will disappear but the trend is clear. Calls on AI, puts on humans
Indeed, but what I would say to you is that it’s they quality if the human that matters now. Without humans it’s just AI training AI and ultimately those AI’s were based on human annotations all the way back.

Nowadays it’s the experts that add the key annotations and train the lower cost annotators, so in the end it’s all down to your education system.
 
Top