Beginner's Guide to Automatic1111 Stable Diffusion Web UI

In the rapidly evolving world of AI-generated art, latent text-to-image diffusion models have taken center stage, offering a versatile way to create stunning visuals from textual prompts.

However, to harness the full potential of open-source models like Stable Diffusion, a user-friendly interface is essential – unless, of course, you are extremely comfortable with programming languages like Python. Enter the Automatic1111 Stable Diffusion WebUI – an open-source project that simplifies interaction with Stable Diffusion models, transforming complex commands and parameters into an approachable, graphical experience.

Want the virtual tour? Watch our video that walks you through all the core components of WebUI.

Minimum Requirements to Run WebUI

While WebUI repository does not explicitly state system requirements to run the application, when working with Stable Diffusion models, you’ll want at least the following:

16GB RAM
10GB disk space for model storage and generation
A GPU (NVIDIA with 2GB VRAM minimum; however, 4GB VRAM or more is recommended for better performance)

Note: If you don’t have a GPU, you should still be able to install and use WebUI, but the image generation process will be significantly slower as it will rely on CPU processing. We wrote a guide specifically for Mac users who need to use CPU-only mode.

If you are curious about the performance of your machine, we recommend checking out this shared spreadsheet that has user-provided benchmarks.

Compatible Models

WebUI works well with any Stable Diffusion model, including 1.x, 2.x, and the new XL models. Additionally, fine-tuned models available privately or on sites like Civitai can also be used with WebUI.

Tour Guide

Given that WebUI is an open-source project, it is constantly being updated and improved. This tour of the notable sections will be at a high level to provide you with what you can expect from each section. We’ll only lightly touch on the parameters and what they do – however, know that there is a lot of code being executed behind the scenes with each parameter you change.

Text-to-Image (txt2img)

screenshot-of-txt2img-tab-in-automatic1111-webui

At its simplest level, txt2img is where you can input a text prompt, and WebUI will use the Stable Diffusion model to generate an image based on that text.

You can also set important parameters like the negative prompt, sampling method, dimensions, and more.

Expect to spend a lot of time in this section if you just want to generate brand-new illustrations from text prompts. For example, did you ever want to see what a super saiyan frog would look like?

Or perhaps you’ve pondered the aesthetics of a Victorian-era cityscape blended with cyberpunk elements.

Imagine not; here’s what we were able to make in a matter of seconds with txt2img:

frog-with-super-saiyan-hair-in-a-dense-forest

Victorian-era-cityscape-blended-with-cyberpunk-elements

Pretty cool, eh?

Again, this is just scratching the surface, but if you are looking for an alternative to other services like DALL-E, Midjourney, or DeepAI, the txt2img feature of WebUI offers a robust and customizable experience that can yield high-quality results, with the added benefit of being able to tweak a multitude of parameters to fine-tune your images to a degree not typically offered by more streamlined services.

Core Parameters

Within txt2img, you’ll mostly be playing with the following parameters:

Prompt: Here you’ll describe the image you want to generate. The text you provide will be tokens that get fed into the AI model as an embedding, which then interprets these descriptive words to generate an image that matches as closely as possible.
Sampling Method: The algorithm used to denoise the image during the generation process. There are various methods like Euler a, Euler, DPM2, and Heun, each with its own unique characteristics.
Sampling Steps: How many times an algorithm iterates to refine the image. A higher number of steps will divide the process into finer intervals, so the model won’t have to take larger “leaps” in generating the details of the image. This often leads to a more coherent and detailed result but also requires more computational power and time.
Hires. fix (recommended for GPU-enabled machines only): Upscales the image in a two-step process.
CFG Scale: This informs the model how closely it should follow your prompt. Think of it like the “temperature” in large language models. A higher CFG value will more strictly adhere to the prompt details, whereas a lower CFG value will allow more randomness and potentially more creative interpretations of your prompt.
Seed: The seed parameter is a numerical value that ensures the reproducibility of the image. When you input a specific seed number, the model uses this to generate random noise and start the image creation process from the same initial state every time the seed is used.
Width & Height: Defines the final output dimensions of the generated image. Set this to the default values depending on the model.

Img2Img

Rather than starting from scratch with just a text prompt, img2img allows users to provide an initial image which the Stable Diffusion model then takes as a base to generate a new image based on additional textual input.

If you wanted to bring a child’s drawing to life, you could simply upload their sketch, input a descriptive text prompt to guide the AI, and WebUI will transform the drawing into a more detailed and polished piece of artwork.

Here we show how a simple line drawing can be turned into a vivid, detailed masterpiece using img2img:

comparing-flower-drawing-with-stable-diffusion-models — Source of original image: https://unsplash.com/photos/a-childs-drawing-of-a-flower-with-crayons-hv9TZAJaaY4

This mode is particularly useful for artists and creators who want to iterate on an existing piece of art or transform a photograph into a different art style.

Aside from altering existing images, img2img is also commonly used to extend the edges of a composition, referred to as outpainting, to create a larger canvas while maintaining the style and content of the original work.

We have a post that details the outpainting process in full if you are interested in this.

Core Parameters

img2img has several core parameters that users can adjust to achieve the desired results (refer to the previous section for parameters that may overlap with other modes):

Prompt: The prompt in img2img is similar to the one used in text-to-image generation, but instead of creating an image from scratch, it guides the AI in modifying or enhancing the existing image. This can include adding new elements, changing the style, or adjusting the mood of the image.
Resize Mode: Ways to change the dimension of the original image as it is processed. Can decide whether you want to simply resize, crop and resize, resize and fill, or latent upscale.
Denoising Strength: This sliding value determines how much noise should be added to your uploaded image during the generation process. A higher noise value will lead to a different image from the original whereas a low value will show very few changes.
Refiner: Change to a refining model at a predetermined point in the process, which allows you to enhance details without dramatically altering the core composition of the image.

Inpainting

While photo editing tools like Photoshop have long been the standard for image manipulation, the inpainting feature in WebUI provides a different approach. Inpainting allows you to select a portion of an image to be replaced or altered while considering the context of the surrounding area.

This is particularly useful in many different ways.

When generating AI images, the model may not always get everything right on the first shot. For example, there could be an extra finger, disproportionate features, or unwanted artifacts present in the image. But just because the model didn’t get one small feature correct doesn’t mean you need to start over from scratch.

The inpainting feature allows you to correct portions of an image that you aren’t happy with.

Here we’ll test out a few different faces on a portrait:

four-images-to-demonstrate-changes-inpainting-does-on-a-womans-face

WebUI makes it fairly easy to inpaint as you can simply brush (mask) the area you want to be reworked and use the many parameters available to tell the model how it should understand and fill in the selected area.

If you want to learn more about inpainting, we put together an entire guide with all the parameters and techniques you can use to get the best results.

Core Parameters

With inpainting the parameters revolve around the use of the mask selection. Here’s what you would typically find:

Mask Blur: Applies a guassian blur to the edges of the mask, which can create a smoother transition between the inpainted portion and the untouched image.
Mask Mode: Sets the selection to the mask or the inverse of the mask, allowing you to choose which part of the image you want to affect.
Masked Content: How much information you want to send to the model about what is under the mask.
Inpaint Area: Decide how much context the model should take from the surrounding area. If ‘only masked’ is selected, then padding pixels can be set to fine-tune how much additional space around the mask should be considered when the model is filling in the gap.

Extras

When working with Stable Diffusion models, it is best to work at the native resolution at which the model was trained; typically, this is 512×512 pixels for 1.x models, 768×768 pixels for 2.x models, and 1024×1024 for XL models.

Within the Extras tab of WebUI, you can upscale these small images to larger resolutions without losing significant detail or introducing excessive artifacts.

WebUI offers some helpful options to upscale either an individual photo or an entire batch of photos. Not only that, but upscaling can work well on any picture you may have, including family photos, artwork, or even historical photographs that you’d like to see in a higher resolution.

PNG Info

Every image you generate with WebUI has all the parameters baked into the image. If you ever want to work on this image again at a future date and don’t remember all the parameters, the PNG Info can extract them and send them to the relevant tab within the interface, allowing you to either reproduce the same image or tweak the parameters for a variation.

This feature is invaluable for workflow optimization and for maintaining consistency across a project where multiple iterations of an image are required.

Extensions

Aside from image generation, WebUI can be extended through the use of extensions and scripts that the community has developed, allowing you to test prompts, face swap, create videos, use controlnet guides, enhance the details of a generated image, and more.

Other Tabs (For More Advanced Purposes)

While the above is more than enough to get a beginner started with creating and manipulating images using Stable Diffusion, more advanced users might find themselves exploring the additional tabs and features that WebUI offers. These include the Train, Checkpoint Merger, and Settings tabs.

If you have a powerful GPU and want to make your own models, then the Train tab is where you will spend a significant amount of time. Automatic1111 has a helpful guide on all the parameters and workflow that you should read before starting.

The Checkpoint Merger is a terrific way to blend models together or make your own inpainting models.

Lastly, there are many settings within WebUI you can change to customize it to your personal preferences or to optimize performance based on your system’s hardware capabilities.

Hidden Features

Up to this point, we have only discussed the user interface; however, there are still even more options available within the underlying system that powers WebUI for Stable Diffusion, such as API calls and command-line arguments.

These extend the functionality of the tool to a programmatic level where you can automate processes, create custom pipelines, or integrate the Stable Diffusion capabilities into other applications or systems – great for businesses, power users, or developers.

Alternatives

As we said at the start, WebUI is only an application that interacts with Stable Diffusion models. Many other alternatives have quickly emerged, each with its own unique features and user interfaces.

Here are a few of the more popular open-source alternatives that can work with Stable Diffusion models:

New options are being released all the time, so be sure to check out open-source repositories on GitHub to see what’s new.

Beginner’s Guide to Automatic1111 Stable Diffusion Web UI

Minimum Requirements to Run WebUI

Compatible Models

Tour Guide

Text-to-Image (txt2img)

Core Parameters

Img2Img

Core Parameters

Inpainting

Core Parameters

Extras

PNG Info

Extensions

Other Tabs (For More Advanced Purposes)

Hidden Features

Alternatives

Monthly Updates