Original Tweet


The next evolution: VLA+ models

Just yesterday @MSFTResearch released Rho-alpha (ฯฮฑ) โ€“ their first robotics model, built on the Phi family.

While most Vision-Language-Action (VLA) models stop at vision and language, Rho-alpha adds:

โ–ช๏ธ Tactile sensing to feel objects during manipulation โ–ช๏ธ Online learning that lets it improve from human corrections (via teleoperation, 3D mouse or other tools) in real-time even after deployment.

Both these sides make adaptability central rather than incidental. Microsoft calls it a VLA+ model, positioning it as an extension beyond what current VLA systems support.

โžก๏ธ Today Rho-alpha can control dual-arm robot setups to perform tasks such as:

โ€ข Manipulating the BusyBox following natural-language instructions โ€ข Plug insertion โ€ข Toolbox packing and object arrangement with bimanual coordination

But to understand why this โ€œplusโ€ matters, we need to understand what came before. Here, weโ€™ll take you through the entire landscape of VLA models โ€“ Gemini Robotics, ฯ€0, SmolVLA, Helix, ACoT-VLA and others: https://www.turingpost.com/p/vlaplus

๋ฏธ๋””์–ด

image


Tags

3D-Vision Robotics AI-ML