Embodied AI Reading Notes (@EmbodiedAIRead)
2025-10-29 | โค๏ธ 139 | ๐ 12
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Project:ย https://ctrl-world.github.io/ Paper:ย https://arxiv.org/abs/2510.10125 Code:ย https://github.com/Robert-gyj/Ctrl-World
This work introduces a controllable multiview world model for robot manipulation, enabling consistent policy-in-the-loop long-horizon interactions (20 seconds) within modelโs imagination, which can be used to evaluate and improve instruction following of modern generalist robot policies.
-
Ctrl-World world model initializes from a pretrained video diffusion backbone with spatial-temporal transformers, and introduces three key adaptations to become a policy-compatible interactive simulator: (1) Multi-view joint predictions including wrist cameras (2) Pose-conditioned memory retrieval mechanism (3) Frame-level action conditioning.
-
Policy Evaluation use case: authors show imagination-based evaluation with Ctrl-World faithfully reflects policiesโ real-world instruction-following ability, and the model can sustain coherent rollouts for over 20 seconds in novel scenes beyond its DROID training dataset.
-
Policy Improvement use case: authors demonstrate that by collecting synthetic successful rollouts from novel instructions inside the world model, they can perform supervised fine-tuning to improve the policy performance.
๐ ์๋ณธ ๋งํฌ
- https://ctrl-world.github.io/
- https://arxiv.org/abs/2510.10125
- https://github.com/Robert-gyj/Ctrl-World
๋ฏธ๋์ด
