johnnys.newsjohnnys.news

by Joachim Leonfellner ✌️ i'm an indie dev in ❤️ with side-projects

Dec 17, 20221018 words in 7 min


how to play with Stable Diffusion 2.1 locally

Stable Diffusion

I’m pretty sure at this point most of the tech savvy people on the internet have heard about or have seen some AI-generated art. There are different types of models that can be used to generate image data. I have described GANs (generative adversarial networks) in a previous post, where i used StyleGAN to generate Bored Apes. Other approaches for generating image data are VAEs (variational autoencoder) and diffusion models. For this post i will focus on the latter and especially Stable Diffusion.

There are other models that can be used for image generation but i especially like Stable Diffusion because it’s open source and it claims no rights on generated images and gives users the rights of usage to any generated images. This is a big plus for me because i want to be able to use the generated images for my own purposes. The repository for Stable Diffusion can be found here on GitHub https://github.com/Stability-AI/stablediffusion and you can experiment with a hosted version on HuggingFace https://huggingface.co/spaces/stabilityai/stable-diffusion. At the time of writing the latest version is 2.1 which is also the one i will be using.

Since the code and model are publicly available you can also run the model locally on your own hardware. Which is what i will show you in this post. I’m using a Windows 11 system, but the setup should be similar on other operating systems.

Prerequisites

  • Git
  • Python 3.10.6
  • Discrete GPU with >4GB VRAM

Setup

There are plenty of parameters that can be set when interacting with Stable Diffusion therefore i find it useful to have a GUI which simplifies this process. I’m using stable-diffusion-webui which is a web interface for Stable Diffusion. It can be found here on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui.

Stable Diffusion Web-UI

Let’s start by cloning the repository:

1
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui

In the root of the repo we can find the models folder. This is where we will place the model files. I’m using version 2.1 of Stable Diffusion which can be downloaded here https://huggingface.co/stabilityai/stable-diffusion-2-1.

download model file

There are different checkpoints available for different resolutions and features. I’m using 768-v-ema.ckpt for my experiments. In the addition to the actual model file we also need the configuration file from the Stable Diffusion repository https://github.com/Stability-AI/stablediffusion/tree/main/configs/stable-diffusion. I downloaded v2-inference-y.yaml and place it in the same directory as the model file.

Now we can start the web-ui by executing the webui-user.bat file. This will setup the local environment and start the web-ui. The first run can take a while because the script will download all required dependencies. After everything is setup you should see a message like this in your terminal:

Terminal - showing the local address for web-ui

Let’s launch a browser and open the url that is shown in the terminal. You should see the web-ui:

Stable Diffusion Web UI

There are plenty of options to play around with. I would recommend having a look at the wiki page where you can find descriptions for all parameters https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features. The most important features are of course the txt2img where you can enter a description for the image you want to generate. There is also a negative prompt where you can add text for things you don’t want to see in the generated image. A fun feature here is the 🎨-button which adds a random artist name to the prompt for inspiration.

basic configuration

Below the prompt you can see Sampling Steps which defines how many times the image will be refined during the generation process. A higher number would in general produce a better output, but it will also increase the compute time.

The next section is for image dimensions. You can set the Width and Height of the output - keep in mind that a higher resolution will also increase the generation time.

The next important section in the web-ui are the Batch Count and Batch Size settings. The count defines how many images will be generated. By setting the Batch Size you can control how many images are generated in parallel. If you have beefy hardware you can increase the Batch Size to speed up the generation process.

CFG Scale is also an interesting parameter. It defines the how strict the model should follow the prompt. With a lower value Stable Diffusion will have more freedom to generate images that are not exactly what you have described in the prompt. A higher value will make the model more strict and will produce images that are more accurate to the prompt.

If you are generating portraits or images of people you can check the Restore Faces checkbox which could help generating more realistic faces.

If you want different results with each run you should set a random Seed value by clicking the 🎲-button or just input -1.

my examples

There are of course many more parameters and features that you can play around with. But i think this is a good starting point, so have fun generating some images - here are some of my results:

a metro train flying over the skyline of paris
a metro train flying over the skyline of paris

a watercolor painting of a frog playing super nintendo on a retro tv
a watercolor painting of a frog playing super nintendo on a retro tv

a small modern house in the nature which only consists of balconies
a small modern house in the nature which only consists of balconies