Generative AI - Part 4 - Image Generation - 1

Investigation

Nov 06, 2023

Image generated with DALL-E 3 depicting developers working on image generation — Author with DALL-E 3

Gencraft

Gencraft is a relatively new generative AI system to create images based on a user-defined textual prompt. It offers a free plan allowing up to 10 prompts per day, providing two images per prompt. Gencraft offers two paid levels, with Starter increasing the quota to 25 prompts per day and Pro providing unlimited use.

Along with the text prompt, you provide a style (anime, cyberpunk, realistic etc.) and different models to choose from (although some of these models are not available in the free level). I counted 27 styles and tried out a few with good results. (See one below)

Image generated with GenCraft and depicting developers working on image generation — Author with Gencraft

DALL-E / ChatGPT

DALL-E is a tool OpenAI created to generate images. OpenAI says long, detailed prompts work better.

DALL-E belongs to a class of generative models named diffusion models. Diffusion models are probabilistic models with scoring functions.

It also allows users to upload a photo/image and use it as a basis to generate other images.

DALL-E gives each user 15 free credits per month to generate images and you can buy additional credits. (This only works for people who have opened an account before April 6, 2023).

Currently OpenAI has opened the DALL-E 2 model freely to all public and has made DALL-E 3 available to pro users in October 2023. DALL-E 3 is built natively on ChatGPT, thus uses its powers to be more accurate in the interpretation of the textual prompts.

I used DALL-E 2 to generate the futuristic image you see below. I don’t think the quality is that good, but it could be due to my poor description in the input prompt.

Image generated with DALL-E 2 depicting both historical and futuristic elements — Author with DALL-E 2

I signed in for a Pro membership to try out DALL-E 3 but it was not directly available. Instead, I used ChatGPT 4 in DALL-E mode (you basically invoke Chat GPT and select one of the two modes avaılable, namely Default, which is Chat GPT 4 in text mode and DALL-E which returns images. I described a rather professional task to ChatGPT with the prompt shown below:

Examples of icon generation using DALL-E 3

It is an amazing result, given the context and the simplified description I used in the prompt. I then noticed that the numbers in the third icon did not make any sense, so I asked DALL-E to solve this issue.

Example of generating variations on icons for iOS applications — Author with DALL-E 3

Quite handy!

I also tried the same thing with another app icon that I had in mind.

Generating iOS application icons with DALL-E 3 — Author with DALL-E 3

This is an epic collection of icons that could be used professionally for an application. Check the detailed explanations DALL-E provides to describe the 4 icons.

I liked the first icon but thought it was a little bit darker for my taste, so DALL-E to the rescue!

Generating a variation on icons with DALL-E 3

I went ahead to design yet another icon set, this time describing the icons in greater detail. Looking at the report, it is obvious that the results are much better when the user prompt has more detail in it. DALL-E captured the essence of what I was trying to describe as the content of the application and combined various elements to give me almost exactly what I wanted. The only problem is that the icons do not necessarily look that great on the screen of the iPhone when a small copy of them is used.

I also used DALL-E 3 regularly for my Substack posts and generated art about the topics I covered.

I must say I’m pretty impressed with this iteration of ChatGPT/DALL-E. I believe this is due to much better language processing with the trillions of parameters ChatGPT 4 has been trained on, but also possibly due to other techniques introduced in the training phase.

I’ve also seen some criticism of DALL-E 3’s performance in some comments, so my evaluation above does not seem to be accepted by all.

Stable Diffusion

Stable Diffusion XL is yet another diffusion model. DreamStudio is the application from Stability AI using the Stable Diffusion XL model. The application gives users a certain number of free credits that are renewable monthly.

There is typically a queue of around 30 jobs when you submit the prompt. It’s good that SD gives you some feedback about where in the queue your job is since this makes the total production time seem to be less than it is (I did not see any discernible difference with other image-generation AI tools). SD has some advanced options that you can use to tweak the prompt.

I used it to produce an image of two gunslingers against a backdrop of skyscrapers in a city in ruins. As you can see, the produced image looks straight out of the latest action feature with believable protagonists. Also, note the fact that the background is blurred properly in cinematic mode.

Image generated with Stable Diffusion and depicting two detectives — Author with Stable Diffusion Online

I also used it to produce the iOS icons for the diet app that I had previously tried with ChatGPT as well.

Generating iOS application icons with Stable Diffusion — Author with Stable Diffusion XL

As it is claimed on Stability AI’s site and elsewhere, Stable Diffusion XL is the most photorealistic image generation model that is available. I think the claim is justified. I haven’t been able to try it in other modes, but the photorealism is obvious from the produced images.

(to be continued…)

Back to Software Development

Discussion about this post