The world's first universal embodied base model Genie Operator-1 (GO-1) was officially released. What's so great about this big model?
▼GO-1's core innovation lies in the "ViLLA architecture" (Vision-Language-Latent-Action), which deeply integrates the multimodal large model (VLM), hybrid expert system (MoE) and implicit action planning:
¹VLM: Through training with massive Internet graphic data, robots are given the ability to "see the world" and "understand language", such as recognizing cups of different shapes and understanding the "pour water" command. ²MoE-Latent Planner: Using cross-ontology and human operation video data, the action logic is refined, allowing robots to "think about steps" like humans, such as abstracting the general strategy of "move-wipe-avoid obstacles" from the video of wiping the table.
³MoE-Action Expert: Relying on millions of real machine data training, it can achieve millimeter-level precision operations, such as pouring water without spilling, and adaptively adjusting the strength when grabbing fragile items.
With the collaboration of the three, GO-1's success rate in tasks such as pouring water and cleaning is 32% higher than that of existing models, and its small sample learning ability breaks through the data bottleneck - only a small number of teaching videos are needed to generalize new skills.