Shap-E is OpenAI’s fastest text-to-3D model to date



summary
Summary

OpenAI dominates the media with ChatGPT, but the company is also researching other generative AI models. A new paper shows a text-to-3D model.

In late 2022, OpenAI unveiled Point-E, a generative AI model for text-to-3D that received little attention given the enormous success of ChatGPT that same month. In part, this was because Point-E did not produce particularly impressive results.

With Point-E, OpenAI attempted to deliver a particularly fast text-to-3D model based on point clouds. Almost half a year later, the company’s researchers are now presenting Shap-E, a direct successor.

Shap-E is extremely fast and a bit better

Unlike Point-E, Shap-E does not generate a point cloud, but instead directly parameters implicit functions that can be rendered as both textured meshes and NeRFs. Essentially, an encoder converts text or image input into these functions, and a diffusion model generates the desired 3D representation.

ad

Shap-E and Point-E produce similar results, but the former is slightly faster and can be more easily linked to other methods. | Picture: OpenAI

Like its predecessor, the quality of these renderings sometimes falls far short of alternatives such as Dreamfusion, Dreamfields, Magic3D, Dream3D or CLIP-Mesh. However, while CLIP-Mesh needs 17 minutes, Dreamfusion 12 hours and Dreamfields even 200 hours for a model on an Nvidia v100 GPU, Shap-E needs only 13 seconds with text input and only one minute with image input.

Shap-E can be combined with DreamFusion

OpenAI says the results “s highlight the potential of generating implicit representations, especially in domains like 3D where they can offer more flexibility than explicit representations.”

However, Shap-E also has numerous limitations, such as assigning multiple attributes to an object or representing the correct number of objects. The team attributes these shortcomings to limited training data and believe they could be reduced by collecting and generating larger, labeled 3D datasets. In addition, the quality of the objects is limited.

However, to achieve better results, Shap-E could be combined with other optimization-based generative 3D techniques. For example, the team shows that a Shap-E model can be refined as a NeRF with DreamFusion.

If OpenAI finds a suitable architecture, it should be scaled up. Whether that will be Shap-E remains to be seen, but projects like Objaverse are creating large databases of labeled 3D data.

Recommendation

GitHub.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top