Ilir Aliu (@IlirAliu_)

2026-01-30 | ❤️ 206 | 🔁 30 | 💬 2

First fully open Action Reasoning Model (ARM); can ‘think’ in 3D & turn your instructions into real-world actions:

[📍 Bookmark for later]

A model that reasons in space, time, and motion.

It breaks down your command into three steps:

Grounds the scene with depth-aware perception tokens Plans the motion through visual reasoning traces Executes low-level commands for real hardware

Think of it as chain-of-thought for physical action.

Give it an instruction like “Pick up the trash” and MolmoAct will:

Understand the environment through depth perception
Visually plan the sequence of moves
Carry them out… while letting you see the plan overlaid on camera frames before anything moves

It’s steerable in real time: draw a path, change the prompt, and the trajectory updates instantly.

AAAANNNDDD: It’s completely open: checkpoints, code, and evaluation scripts are ALL PUBLIC!

MolmoAct runs across different robot types (from gripper arms to humanoids) and adapts quickly to new tasks.

It outperforms models from major labs like NVIDIA, Google, and Microsoft on benchmark tests for generalization and real-world success rates.

For anyone building robotics systems or studying AI-driven action models, this is worth exploring… and worth sharing! ♻️

📄 원문 내용

All models for the MolmoAct (Multimodal Open Language Model for Action) release.

All datasets for the MolmoAct (Multimodal Open Language Model for Action) release.

MolmoAct is the first model able to “think” in three dimensions, trained efficiently and delivering benchmark-topping performance.