Artificial Intelligence thread

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Please, Log in or Register to view URLs content!

Kuaishou talked about its AI business. Remember, they produced Kling.

some notes from its call

我们的 1750 亿规模“快意”大语言模型也实现了中文场景下综合性能超过 GPT 4.0 的目标。我们的多模态大语言模型也在视频内容方面达到了 GPT-4V 思维的水平,为之后应用层能力的扩展打下了坚实基础。此外,近期我们的文生图大模型产品“可图”也宣布正式开源,作为最懂中文的文生图模型,“可图”经过多个版本的迭代,其综合性能超过了人工智能技术的图像生成模型 Midjourney V5。
it's 175B param KuaiYi LLM has achieved better result for Chinese language than GPT 4.0

Its multimodal LLM in video also achieved GPT-4V capabilities.

They have recently open sourced Ketu photo large model. The functionality is even better than Midjourney V5

2024 年上半年,有近两万商家在快手平台借助大模型能力实现了智能化经营,而人工智能生成内容营销素材在今年 6 月的日均消耗也达到了 2000 万,展现了大模型在商业场景中的巨大潜力。
In first half of 2024, almost 20000 merchants on Kuaishou are using large model to achieve intelligent operation. AIGC marketing materials reached 20 million in June this year, demonstrating the huge potential of big models in commercial scenarios.

另外,关于整体人工智能投入对于利润率的影响,短期来看,我们在人工智能大模型上的投入不会给集团的盈利能力带来明显影响。从长期来看,人工智能也能通过赋能现有业务和创新的业务场景,不断提升人工智能相关投资对公司业务发展的价值。

总之,提质增效是公司的长期战略机制,公司在未来每一个战略决策的制定和实施过程中都会包含相关的考量。而人工智能策略又赋予了我们在扩大收入增长和提质增效上更多的可能,给我们带来更多信心,以实现长期盈利能力的不断改善和提升。
short term, AI investment won't bring in profit. This is a longer term play.
 

OptimusLion

New Member
Registered Member
[Baidu, SenseTime, and Zhipu are the top three, and IDC released the first report on the market share of big model platforms and applications] International Data Corporation (IDC) today released the "China Big Model Platform Market Share, 2023: The First Year of Big Models - Preliminary Results" for the first time. The data shows that the market size of China's big model platforms and related applications will reach RMB 1.765 billion in 2023. Click for details:
Please, Log in or Register to view URLs content!
 

OptimusLion

New Member
Registered Member
MooreThreads releases KUAE 1.2 version of Kua'e Intelligent Computing Cluster to continuously optimize the training efficiency of large models

Recently, Moore Thread officially released KUAE 1.2 version of its intelligent computing cluster. This version has achieved multi-dimensional upgrades in functions and performance through comprehensive optimization at the software and hardware levels, making the product more efficient, stable, and more friendly to the ecosystem, aiming to continue to provide solid and reliable computing power support and innovation driving force for large model training.

▼MFU increased by 10%, up to 55%

In the new version, the MFU is increased by 10% when using the Qianka cluster to train hundreds of billions of models. The MFU of dense model cluster training can reach up to 55%.

▼Flash Attention2 Optimization

By integrating the latest MUSA SDK platform and the optimized Flash Attention2 technology, combined with the new version of Torch MUSA and operator fusion, the efficiency and resource utilization of large model training are significantly improved, the training cycle is greatly shortened and the overall cost is reduced.

In the new version, the MFU is increased by 10% when using the Qianka cluster to train hundreds of billions of models. The MFU of dense model cluster training can reach up to 55%.

▼64K long text support

The new version enhances support for long text large model training, optimizes the ability to handle long text understanding and generation tasks, and can better cope with complex language processing tasks such as document summarization and article writing.

▼Support hybrid expert model MoE

The MCCL communication library has completed the All2All optimization and optimized the matrix operations of the muDNN operator under different shapes to better support the training of MoE (Mixture of Experts) large models. This not only improves the efficiency of intelligent computing, but also provides a highly scalable foundation for the training of large models with larger scale parameters.

▼ Breakpoint resume training

The checkpoint read and write performance of large model training has been further improved, with the write time being less than 2 seconds, significantly improving training efficiency.

▼Optimize DeepSpeed

The DeepSpeed and Ulysses based on Moore's Thread GPU clusters have been adapted and performance optimized, and support for long text training has been enhanced. It has also adapted a variety of large models at home and abroad, and supports training and fine-tuning of major open source large models on Hugging Face, helping innovative companies to flexibly select different large models to develop intelligent applications based on Moore's Thread GPUs.

▼ Improved stability

The software and hardware of the Qianka cluster have become more mature, achieving continuous trouble-free training for up to 15 days. The new version introduces the KUAE Aegis reliability function, which strengthens the monitoring, automatic diagnosis and fault recovery capabilities of GPU, video memory, and collective communication.

▼Visualization/Observability

The introduction of the PerfSight performance monitoring system can display resource consumption and performance analysis data during model training in real time, helping to quickly discover and recover from faults during training and meet performance tuning requirements on large models.

▼New large models added to the built-in model library

KUAE's built-in model library Model Zoo has added LLaMA2 full series large models, Baichuan, Yayi, Qwen2, Mixtral (MoE 8x7B) and other models.
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
btw, 55% utilization is really high. I doubt it can sustain that in real operation.

Also, the computation power of MT chips are still lower than others. So it's ideal for more like < 30B parameter models.
 

tphuang

Lieutenant General
Staff member
Super Moderator
VIP Professional
Registered Member
Please, Log in or Register to view URLs content!

微博宣布接入阿里云通义大模型,提升内容生产效率和社区活跃度。微博不仅是阿里云最早一批上云客户,也是阿里云通义大模型的最早客户,双方合作近十年。面对突发热点事件带来的流量挑战,微博在阿里云上每分钟可调度超过1700台实例。AI时代来临,在AI算力方面,阿里云支持微博的GPU交付效率提升50%。

Weibo is Alicloud's earliest customer uses over 1700 computers on Alicloud per minute. In AI computation, Alicloud increased GPU delivery to Weibo by 50%

阿里云是大模型开源最积极的推动者和实践者。去年8月,阿里云通义大模型宣布加入开源行列,随之启动马不停蹄的开源狂飙,沿着“全模态、全尺寸”开源的路线,陆续推出十多款开源模型。目前,通义千问开源模型下载量已突破2000万

本季度,阿里云发布了全球性能最强的开源模型Qwen2-72B,该模型提升了代码、数学、推理、指令遵循、多语言理解等能力,Qwen2-72B发布后即登顶HuggingFace 的Open LLM Leaderboard开源模型榜单。中文大模型测评基准SuperCLUE在2024上半年报告中指出,Qwen2-72B成为排名第一的中国大模型,也是全球最强的开源模型,“超过众多国内外闭源模型”,“引领全球的开源生态”。
Alicloud's Qwen has been downloaded > 20 million times already. Found itself on top of hugging face leader board.
此外,携程、喜马拉雅、新东方、美图、哈啰等头部互联网企业,以及星巴克等全球五百强公司纷纷接入通义大模型;中国一汽、小鹏汽车、长安汽车、零跑汽车、vivo、联想等汽车、手机、PC厂商已在大模型领域与阿里云深度合作
A list of companies that are all using Qwen LLM and other large models in Alicloud.
 

OptimusLion

New Member
Registered Member
Baidu's latest flagship model Wenxin 4.0 Turbo fine-tuning service is launched

IMG_20240822_102624_328.jpg


On August 21, Baidu Smart Cloud announced the launch of the Wenxin flagship large model ERNIE 4.0 Turbo fine-tuning service to help companies use their own business data to train large models that are more suitable for corporate application scenarios, greatly improving the effectiveness of the model in business use. From now on, corporate users can log in to the Baidu Smart Cloud official website to apply for the experience.

It is understood that Baidu Smart Cloud Qianfan Platform has previously supported ERNIE 3.5, ERNIE Speed, ERNIE Lite, ERNIE Tiny, and ERNIE Character for model fine-tuning. As of now, a total of 6 Wenxin large models can be fine-tuned and used on the Qianfan platform. A total of 21,000 models have been fine-tuned, serving the core business scenarios of more than a thousand companies, and have many successful cases.

Although the general big model has powerful understanding, generation, logic and memory capabilities, as an "undergraduate" with strong general knowledge, it often cannot fully meet the needs of enterprises in actual applications, such as the particularity of the industry and scenarios of the enterprise, the customized needs of content generation, data privacy and security, etc.

SFT (Supervised Fine-Tuning) is the main method for fine-tuning large models. It builds input and output for specific tasks to align the model's performance on the task with the capabilities of professionals. For example, if a model already has knowledge in the financial field, SFT can teach it the basic logic and steps of research report analysis, so that the model has better capabilities in completing the specific task of "research report analysis".

In addition, opening up large model fine-tuning services is also an important manifestation of large model manufacturers in ensuring customer data security, privacy security, and model controllability.

From the perspective of the global market, OpenAI recently officially launched the fine-tuning service for its flagship large model GPT-4o. Industry commentators said that this is an important strategic move for OpenAI to actively respond to the needs of B-side users, enhance its differentiated advantages over competitors such as Google, Meta, and Anthropic, and increase its investment in the toB track.

Among the mainstream large model manufacturers in China, except Baidu Smart Cloud, other manufacturers have hardly opened fine-tuning services for any flagship models. At present, Baidu Smart Cloud has provided 6 Wenxin large models including ERNIE 4.0 Turbo and ERNIE 3.5, and has fine-tuned 21,000 models in total, serving the core business scenarios of more than 1,000 enterprises.
 
Top