Original Tweet

The next evolution: VLA+ models

Just yesterday @MSFTResearch released Rho-alpha (ρα) – their first robotics model, built on the Phi family.

While most Vision-Language-Action (VLA) models stop at vision and language, Rho-alpha adds:

▪️ Tactile sensing to feel objects during manipulation ▪️ Online learning that lets it improve from human corrections (via teleoperation, 3D mouse or other tools) in real-time even after deployment.

Both these sides make adaptability central rather than incidental. Microsoft calls it a VLA+ model, positioning it as an extension beyond what current VLA systems support.

➡️ Today Rho-alpha can control dual-arm robot setups to perform tasks such as:

• Manipulating the BusyBox following natural-language instructions • Plug insertion • Toolbox packing and object arrangement with bimanual coordination

But to understand why this “plus” matters, we need to understand what came before. Here, we’ll take you through the entire landscape of VLA models – Gemini Robotics, π0, SmolVLA, Helix, ACoT-VLA and others: https://www.turingpost.com/p/vlaplus

미디어

dynamicvla — 주제: Vla, Robotics Manipulation
first-fully-open-action-reasoning-model-arm-can-think-in-3d- — 주제: Robotics Manipulation
first-fully-open-action-reasoning-model-arm-can-think-in-3d-turn-your-instructio — 주제: Robotics Manipulation
mamba-policy-towards-efficient-3d-diffusion-policy-with-hybrid-selective-state-m — 주제: Robotics Manipulation
what-if-your-robot-or-car-could-see-depth-more-clearly-than- — 주제: Vla

📚 세현's Vault

🌍 도메인

📄 Papers

The next evolution: VLA+ models

Original Tweet

미디어

Tags

그래프 뷰

목차

📚 세현's Vault

🌍 도메인

📄 Papers

The next evolution: VLA+ models

Original Tweet

미디어

🔗 Related

Tags

그래프 뷰

목차