Junyi Zhang (@junyi42)
2025-05-21 | โค๏ธ 207 | ๐ 21
Very impressive! At https://www.videomimic.net/, we already:
learn from 3rd-person human videos + RL โ for locomotion.
Excited to see where this path goes next! https://x.com/junyi42/status/1925173717610500558/video/1
๋ฏธ๋์ด
![]()
์ธ์ฉ ํธ์
Milan Kovac (@milankovac)
One of our goals is to have Optimus learn straight from internet videos of humans doing tasks. Those are often 3rd person views captured by random cameras etc.
We recently had a significant breakthrough along that journey, and can now transfer a big chunk of the learning directly from human videos to the bots (1st person views for now). This allows us to bootstrap new tasks much faster compared to teleoperated bot data alone (heavier operationally).
Many new skills are emerging through this process, are called for via natural language (voice/text), and are run by a single neural network on the bot (multi-tasking).
Next: expand to 3rd person video transfer (aka random internet), and push reliability via self-play (RL) in the real-, and/or synthetic- (sim / world models) world.
If youโre great at AI and want to be part of its biggest real-world applications ever, you really need to join Tesla right now.