Bilawal Sidhu (@bilawalsidhu)
2025-07-02 | ❤️ 371 | 🔁 47
This BlenderFusion paper basically says “screw trying to describe 3D edits through text” and just… use Blender :-)
The idea is pretty straightforward — instead of trying to cram 3D understanding into a diffusion model, use depth estimation & segmentation to project 2D images into 2.5D meshes, edit them in actual 3D software, then use a fine-tuned diffusion model to make the results photorealistic again.
The clever bit is their “dual-stream architecture” — the model sees both the original scene AND the edited Blender render in parallel, learning to preserve what matters while fixing the inevitable artifacts from transforming imperfect 2.5D/3D reconstructions.
They train it with smart masking strategies so it learns when to ignore the original scene (for removals/replacements) and can manipulate objects independently of camera motion.
What you get is pretty impressive control — not just moving objects around, but changing materials, deforming shapes, swapping backgrounds, all while maintaining visual coherence.
Neural Assets (one of my favorite papers last year) tried to crack this with learned object tokens, but it struggled with overlapping objects and loses fine details (due to low res DINO encodings).
BlenderFusion just sidesteps the whole problem — want to rotate something 173.5 degrees? Just rotate it in Blender. Want to duplicate an object 8 times? Copy paste away. The diffusion model’s only job is making it look photorealistic, not figuring out the 3D underpinnings.
The catch? Lacks temporal consistency for animation. Each viewpoint is generated independently, so while a single edit looks great, smoothly animating a car or camera down the street won’t work — you’d get flickering and inconsistencies between frames.
That said, this approach is so much more intuitive for finer grain image editing than trying to describe your changes in text prompts.
It’s the kind of thing that makes you wonder why we’re trying to do everything inside neural networks when perfectly good 3D tools already exist — giving you the best of both worlds.
🔗 Related
See similar notes in domain-vision-3d, domain-genai, domain-web-graphics, domain-dev-tools
Tags
type-paper domain-vision-3d, domain-genai, domain-web-graphics, domain-dev-tools