Original Tweet
Releasing ViTok-v2: open-source ViT auto-encoder codebase + pretrained weights
Train your own ViT auto-encoder on any streamed (hf://) or local webdataset. NaFlex pipeline handles any resolution and aspect ratio
Includes reproduced 350M and 4.5B models weights competitive at 256p, SOTA at high-res (512p+)
๋ฏธ๋์ด

๐ Related
- 1n-rotary-position-embeddings-rope-are-ubiquitous-across-tra โ ์ฃผ์ : Transformer
- 1n-rotary-position-embeddings-rope-are-ubiquitous-across-transformers-that โ ์ฃผ์ : Transformer
- iggt-instance-grounded-geometry-transformer โ ์ฃผ์ : Transformer
- introduce-differential-transformer-v2-diff-v2-an-improved-version-of-differentia โ ์ฃผ์ : Transformer
- introducing-shaper-a-method-for-robust-conditional-3d-shape-generation-from-casu โ ์ฃผ์ : Transformer