Tag Archives: Stable Diffusion

Stable Diffusion – Playing with parameters

It is fun to make images with Stable Diffusion, but it is also frustrating when the result is not what you expect and it takes long time generate new pictures.

I have been playing with the cpu-only-branch of Stable Diffusion on a linux computer with an 8th Generation Core i7 CPU and 16GB of RAM and here comes some findings.

Basic prompt and parameters

I wanted to generate a useful picture for my Dungeons & Dragons game. So, as a somewhat qualified start I did:

fantasy art of village house, cliff, well, town square, market and storm, in the style of greg rutkowski

I used

  • seed=2 (bacause I did not like seed=1)
  • Sample Steps=10
  • Guide=7.5
  • Sample Model=Euler Ancestral
  • Resolution=512×512
  • The 1.4 model (the small one)

Not very far from default settings. My performance is about 10s per sample step, thus this picture took 1m40s to generate:

This is the unmodified 512×512 picture. Below I will publish smaller/scaled picture but unless otherwise mentioned they are all generated as 512×512. This picture was not so far from what I had in mind, but I don’t see any well or market, or town square.

Sample Methods

I generated exactly the same thing, only changing the Sample Method parameter:

Three of the sample methods took (roughly) twice the time (200% in the name above). I can at least draw the conclusion that the sampling method is not just a mathematical detail but something that actually affects the output quite much.

Sampling Steps

Next thing was to try different number of sampling steps, from 2 to 99:

I find in fascinating how some buildings disappear and are replaced by others at certain thresholds here. Even if it is more expensive to run 75 steps than 10, if you are looking for results like the 75 stops picture, there is no point in generating multiple images with 10 steps. As an amateur, more steps give more details and more sharpness.

Guide

There is a guide parameter (how strongly the image should follow the prompt) and that is not a very obvious parameter. For this purpose i used 30 Sampling Steps and tried a few guide values (0-15 are allowed values):

To my amateur eye, guide seems to be mostly about contrast and sharpness. I do not see that the pictures resembles my prompt more or less.

Resolution

I generated 6 images using different resolutions. Sampling Steps is now 20.

To my surprise the lower than 512×512 came out ok, I have had very bad results at lower resolutions below. It is obvious that changing the resolutions creates a different picture, like with a different seed with the same prompt. The smaller pictures are faster and the larger slower to generate (as indicated by the %), and the largest image caused my 16GB computer to use its swap (but I think something else was swapped out). My conclusion is that you can not generate many pictures a low resolution, and then regenerate the ones you want with higher resolution and the same seed (there are probably other ways to upscale).

Image type

So far all images have been “fantasy art”. I tried a few alternatives with 20 Sampling Steps:

This changes much. The disposition is similar but the architecture is entirely different. What if I like a drawing with the roof style of fantasy art?

Artists

So far I have been using Greg Rutkowski for everyting (at first opportunity I will buy a collection of Greg Rutkowskis work when I find one – so far I have not found any). How about different artists:

Obviously picking a suitable artist is critical for your result. To my surprise, for my purposes Anders Zorn is probably more useful than Boris Vellejo.

Dropping Keywords

So far I have not seen much of wells and markets in my pictures. What about dropping those keywords from the prompt?

The disposition is somewhat similar, and still no wells or markets.

Model Choice

There is a 1.4-model to download, and a larger (full) version. What is the difference. I tried three prompts (all fantasy art in the style of greg rutkowski):

  • old well in medieval village
  • medieval village on cliff
  • medieval village under cliff

Conclusion here is that the result is slightly different depending on model, but it is not like it makes a huge difference when it comes to quality and preference.

Trying to get a well

Not giving up on getting a picture of a well I made 9 pictures, using different seeds and the prompt:

  • fantasy art old well in medieval village, greg rutkowski

None of them contains a well as I think of a well. If I do an image search on Google I get plenty of what I want. Perhaps Stable Diffusion does not know what a well looks like, or perhaps this is what fantasy art and/or Greg Rutkowski would draw wells as.

Conclusion

I did this because I thought I could learn something and I did. Perhaps you learnt something from reading about my results. It is obviously possible to get cool pictures, but what if you want something specific? The prompt is important, but if you are playing with the wrong parameters you may be wasting you time.

Stable Diffusion CPU-only

I spent much time trying to install Stable Diffusion on an Intel NUC Hades Canyon with Core i7 (8th Generation) and an AMD RX Vega (4GB), with no success. 4GB is tricky. AMD is trickier.

I gave up on my NUC and installed on my laptop with Windows, GeForce GTX 1650. That worked, and a typical image (512×512 and 20 samples) takes about 3 minutes to generate.

For practical reasons I wanted to run Stable Diffusion on my Linux NUC anyway, so I decided to give a CPU-only version of stable diffusion a try (stable-diffusion-cpuonly). It was a pretty easy install, and to my surprise generation is basically as fast as on my GeForce GTX 1650. I have 16GB of RAM and that works fine for 512×512. I think 8GB would be too little and as usual, lower resolutions than 512×512 generates very bad output for me.

So when you read “stable diffusion requires Nvidia GPU with at least 4GB of RAM”, for simple hobby purposes any computer with 16GB of RAM will be fine.

Stable Diffusion, GeForce GTX 1650 4GB, Windows 10

I have been trying to get Stable Diffusion working. My Linux workstation has an AMD GPU and that did not work well. I have Dell laptop with a GeForce GTX 1650, with 4GB video RAM, running Windows 10, and I managed to get Stable Diffusion working as expected.

I used this guide.

The key successfactors for this computer are:

  • Use an “optimized” branch, that allows you to generate 512×512 with only 4GB VRAM
  • Dont generate smaller than 512×512 (at least not 256×256)
  • Check [x] full_precision

I wasted a lot of time with non-optimized versions of Stable Diffusion, where I had to go to 256×256 to generate anything, and that anything was always garbage.

It takes about 3 minutes to generate an image with default settings.