MVDream creates impressive 3D renderings from text



MVDream uses Stable Diffusion and NeRFs to generate some of the best 3D renderings yet from text prompts.

Researchers at ByteDance present MVDream (Multi-view Diffusion for 3D Generation), a diffusion model capable of generating high-quality 3D renderings from text prompts. Similar models already exist, but MVDream achieves comparatively high quality and avoids two core problems of alternative approaches.

These often struggle with the Janus problem and content drift. For example, a generated baby Yoda has multiple faces, or a generated plate of waffles changes the number and arrangement of the waffles depending on the viewing angle.

To solve this problem, ByteDance trains a diffusion model such as Stable Diffusion not only with the usual prompt-image pairs but also with multiple views of 3D objects. To do this, the researchers render a large dataset of 3D models from different perspectives and camera angles.



By seeing coherent views from different angles, the model learns to produce coherent 3D shapes instead of disjointed 2D images, the team says.

Video: ByteDance

MVDream to get even better with SDXL

Specifically, the model generates images of an object from different perspectives from a text prompt, which the team then uses to train a NeRF as a 3D representation of the object.

In direct comparison to alternative approaches, MVDream shows a significant jump in quality and avoids common artifacts such as the Janus problem or content drift.

Video: ByteDance


MVDreams GitHub.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top