Junyi Zhang (@junyi42)

2025-05-21 | ❤️ 207 | 🔁 21

Very impressive! At https://www.videomimic.net/, we already:

learn from 3rd-person human videos + RL — for locomotion.

Excited to see where this path goes next! https://x.com/junyi42/status/1925173717610500558/video/1

미디어

video

인용 트윗

Milan Kovac (@milankovac)

One of our goals is to have Optimus learn straight from internet videos of humans doing tasks. Those are often 3rd person views captured by random cameras etc.

We recently had a significant breakthrough along that journey, and can now transfer a big chunk of the learning directly from human videos to the bots (1st person views for now). This allows us to bootstrap new tasks much faster compared to teleoperated bot data alone (heavier operationally).

Many new skills are emerging through this process, are called for via natural language (voice/text), and are run by a single neural network on the bot (multi-tasking).

Next: expand to 3rd person video transfer (aka random internet), and push reliability via self-play (RL) in the real-, and/or synthetic- (sim / world models) world.

If you’re great at AI and want to be part of its biggest real-world applications ever, you really need to join Tesla right now.

원본 트윗

📚 세현's Vault

🌍 도메인

📄 Papers

very-impressive-at-httpstco6nqcp6esbg-we-already-learn-from-3rd-person-human-vid

Junyi Zhang (@junyi42)

미디어

인용 트윗

Tags

그래프 뷰

목차

백링크