First fully open Action Reasoning Model (ARM); can ‘think’ in 3D & turn your instructions into real-world actions:

First fully open Action Reasoning Model (ARM); can ‘think’ in 3D & turn your instructions into real-world actions:

[📍 Bookmark for later]

A model that reasons in space, time, and motion.

It breaks down your command into three steps:

Grounds the scene with depth-aware perception tokens Plans the motion through visual reasoning traces Executes low-level commands for real hardware

Think of it as chain-of-thought for physical action.

Give it an instruction like “Pick up the trash” and MolmoAct will:

  1. Understand the environment through depth perception
  2. Visually plan the sequence of moves
  3. Carry them out… while letting you see the plan overlaid on camera frames before anything moves

It’s steerable in real time: draw a path, change the prompt, and the trajectory updates instantly.

AAAANNNDDD: It’s completely open: checkpoints, code, and evaluation scripts are ALL PUBLIC!

Resources: Models: https://huggingface.co/collections/allenai/molmoact Data: https://huggingface.co/collections/allenai/molmoact-data-mixture 📍Blog: https://allenai.org/blog/molmoact

MolmoAct runs across different robot types (from gripper arms to humanoids) and adapts quickly to new tasks.

It outperforms models from major labs like NVIDIA, Google, and Microsoft on benchmark tests for generalization and real-world success rates.

For anyone building robotics systems or studying AI-driven action models, this is worth exploring… and worth sharing! ♻️

🔗 원본 링크

미디어

image