A new beta version of Stable Diffusion delivers much more aesthetic and photorealistic results than the previous version. Will this make commercial offerings obsolete?
While Stable Diffusion is the most developed open-source image model, it can’t always match the quality and especially the accessibility of commercial competitors like Midjourney.
Its strength so far is not so much in generating aesthetic images after entering a few commands, but in its openness and the possibility of further development by a constantly growing community.
Stable Diffusion XL: Beta available via DreamStudio and API
While Stable Diffusion v2.1 was already a visible leap over v1.5, at least in some scenarios, the latest version, Stable Diffusion XL (v2.2.2), marks a significant improvement. It is still under development, but a beta version is already available via the paid DreamStudio web interface and API. The code will be released on GitHub as usual once it is finished.
We are pleased to announce the latest release in our Stable Diffusion series of imaging solutions. SDXL offers a variety of image generation capabilities that are transformative across multiple industries, including graphic design and architecture, with results happening right before our eyes.
Tom Mason, CTO of Stability AI
Stable Diffusion XL comes with a number of enhancements that should pave the way for version 3. Exactly how the training material differs from previous versions is unknown. However, 80 million images are said to have been removed for v3 at the request of artists.
“Minimalistic home gym with rubber flooring, wall-mounted TV, weight bench, medicine ball, dumbbells, yoga mats, high-tech equipment, high detail, organized and efficient.”
Compared to v2.1 with 900 million parameters, SDXL is also significantly larger with 2.3 billion. According to Stability AI CEO Emad Mostaque, the plan is to have a distilled version ready by the time of release and offer it as an alternative.
Stable Diffusion XL delivers more photorealistic results and a bit of text
In general, SDXL seems to deliver more accurate and higher quality results, especially in the area of photorealism. Human anatomy, which even Midjourney struggled with for a long time, is also handled much better by SDXL, although the finger problem seems to have not been solved yet.
“Skilled archer, bow and quiver of arrows, standing in forest clearing, intense, detailed, high detail, portrait”.
In addition, Stable Diffusion XL will be able to generate text on images for the first time. Although the results are not always perfect, and it may take several tries before the text is correct, Stability AI is the first available text-enabled generative AI model.
As usual with Stable Diffusion, SDXL’s capabilities go beyond text-to-image, supporting image-to-image (img2img) as well as the inpainting and outpainting features known from DALL-E 2. However, the maximum resolution of 512 x 512 pixels remains unchanged.
DreamStudio offers a limited free trial quota, after which the account must be recharged. 5,000 image generations cost about 10 US dollars.
“AI image generation is as good as done,” CEO Mostaque said in a Q&A on the official Discord server shortly after SDXL’s announcement. By the end of the year, he expects “pixel-perfect image generation” that is indistinguishable from real photos.