New: Navigate Medium from the top of the page, and focus more on reading as you scroll.

MLearning.ai
Published in

MLearning.ai

DALL·E 2 vs Midjourney vs Stable Diffusion

Generated images

Text-to-image generation was there for quite some time now. Initially, these were started with the evolvement of generative models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). You can read more about GANs here.

If we look into the broader domain the text-to-image models combine both Computer Vision (CV) and Natural Language Processing (NLP) subdomains.

If we look at these models closely DALL·E 2 is not open to the public. But you can join the program by request from here. On the other hand, Midjourney gives its service through its discord channel. Both of these are not open source and they will remain as that in the future. Stable Diffusion claims to be an open source model and you can find the online workplace to work with as well as google collab notebooks to use this model. These models used a considerable amount of images, and texts to train, and the inner workings of these models will be discussed in another article. In this time we will compare some given prompts and how each model reacts to them.

The below prompts help to understand the capabilities of each model.

  1. Epic style of katsuhiro otomo, wide view, filmed in amazing cinematic light, epic cyberpunk night background, 8k, high resolution, ultrarealistic, photorealistic, intricate, insanely detailed, octane render, unreal engine 5
DALL·E 2 (left), Midjourney (center) and Stable Diffusion (right)

2. A very detailed surreal photo of a ninja fighting a dragon spitting fire

DALL·E 2 (left), Stable Diffusion (center) and Midjourney (right)

3. Rainy train station, noir style, 3dsmax + vray render, extremly detailed, ultra realistic, unreal engine 5

Stable Diffusion (left), DALL·E 2 (center) and Midjourney (right)

4. London, zombie apocalypse, extreme detail, horror

Stable Diffusion (left), DALL·E 2 (center) and Midjourney (right)

5. A beautiful Sri lankan woman wearing traditional clothes half immerged in the Ganges river looking at the camera with an hypnotizing glare. CANON Eos C300, ƒ4, 15mm, natural lights

DALL·E 2 (left), Stable Diffusion (center) and Midjourney (right)

After playing with these three tools my observation is that DALL·E 2 has the ability to work well on natural human images. Midjourney had rich color and realistic images in all attempts. But we need to keep in mind that both of these models are not free. Stable Diffusion has large community support with its open-source nature. Due to this, we can see more advances in this one in the upcoming days.

Even today with the popularity of ChatGPT and similar tools we can see artists using image-generative tools alongside them to create wonders. Further going forward these kinds of conversational AI models will also challenge popular search engines such as Google.

Nowadays digital comics as well as digital arts taking a new leaf due to these tools and looking forward to seeing what's holds next.

If you like to read similar articles and get notified soon join medium using this link.

Enjoy the read? Reward the writer.Beta

Your tip will go to Anjana Samindra Perera through a third-party platform of their choice, letting them know you appreciate their story.

Sign up for Machine Learning Art

By MLearning.ai

Be sure to SUBSCRIBE here 🔵 to never miss another article on Machine Learning & AI Art  Take a look.

Data Scientists must think like an artist when finding a solution when creating a piece of code. ⚪️ Artists enjoy working on interesting problems, even if there is no obvious answer ⚪️ linktr.ee/mlearning 🔵 Follow to join our 28K+ Unique DAILY Readers 🟠

Share your ideas with millions of readers.