Inpainting with Stable Diffusion: Explaining the Parameters

Watch the accompanying video for this post.

Very rarely when working with Stable Diffusion models will you get the perfect image with a single prompt. Whether it’s a disfigured face, too many fingers, added objects, artifacts, or anything else, inpainting can quickly fix these problems.

However, like most aspects of working with Stable Diffusion models, trying to get the results you are looking for may take some trial and error. This article explains the basics of inpainting, the different parameters, and more.

Inpainting Models (Important)

Chances are that you generated an image with a standard model – be it Stable Diffusion 1.x, 2.x, or some fine-tuned model available on sites like Civitai. However, these models are not designed to inpaint images and, while they can work, often require a lot of adjustments.

Inpainting models are specially trained to…you guessed it…inpaint images. They perform exceptionally well, are more beginner-friendly, and will require less tweaking to get the desired results. Many models that you use to generate images have accompanying inpainting models.

When visiting a model page, click on the “Inpainting” model (make sure it’s the same version as the model you are using), and then you can download the model:

Note: Despite being nearly the identical size as the generation model, inpainting models should never be used for image generation.

How the Many Different Inpainting Parameters Affect the Process

Just like image generation, there are many parameters to consider when inpainting. Here’s a summary of the more popular ones and how they impact the output.

Resize Mode

If you are making changes to the scale of the image, then you have four options: just resize, crop and resize, resize and fill, and just resize (latent upscale).

If your source image is 512 x 768 here’s how each parameter may affect the outcome:

Just Resize: Will scale it up or down proportionally without losing any part of the composition. May cause a distorted image if not same aspect ratio.
Crop and Resize: Will trim an edge to meet the required aspect ratio then resize. The depends on what you put into the ‘Resize to’ or ‘Resize by’ box further down in the UI.
Resize and Fill: Will fill the missing area if necessary with generated pixels from Stable Diffusion.
Just Resize (latent upscale): Upscales using latent space to reduce any loss in quality while resizing. Although this may be a current bug.

Mask Blur

This option effectively applies a Gaussian blur to the mask. The higher the value, the less concentrated the brush. This broadens the area of the mask while also reducing the strength – too high and you won’t see any changes in your illustration.

Here’s an illustrative example of what mask blur is doing to your masked selection:

The default of 4 is typically a good starting point.

Mask Mode

You have two modes: inpaint mask and inpaint not mask.

Inpaint mask means that the area you apply with the brush will be regenerated. Whereas inpaint not masked will be an inverse selection of the mask and will not be regenerated.

A way that this could be helpful is if you brush a character in your illustration and then select ‘Inpaint not masked’ to change the entire background.

Here’s an example of what inpaint masked and inpaint not masked looks like:

Inpaint Masked

Inpaint Not Masked

Masked Content

When inpainting, you’ll have the following options for masked content: fill, original, latent noise, and latent nothing.

They are all kind of confusing, however, we’ll try our best to explain them here:

Fill: This takes an average of the colors in the image and draws a new image. This helps to retain consistency and style, but will produce different results than what was there originally. Here’s a video of the generation process:

Notice how things like reflection, color, and composition still fit the scene.

Original: Used in the majority of cases, this mode keeps the context of the original image the same, just generates new outputs in the masked area.

Good if you want to retain many of the features and want to make small adjustments.

Latent Noise: This option doesn’t respect the original image at all and provides completely new art in the masked area.

Just as it goes through the original image generation process, noise is added to the latent space in the masked area, which is then decoded by the model.

When selecting this option, you will want to change the prompt completely to be specific on what you want to add. For example, if just fixing the mouth, your prompt may only be ‘a smile with teeth showing’ and not the original prompt.

A good use of this mode would be to change core features of a character. For example, if your original illustration had a person looking to the side, you could change the prompt to something like: ’20 year old woman, smiling, facing viewer.’

Latent Nothing: This one is confusing at first. While latent noise fills the space with random noise, latent nothing fills the masked area with zeros.

This allows the Stable Diffusion model to generate new art in the masked area that is wildly different than what was there previously. Like latent noise, you want to change your prompt to direct the model in the right direction.

Otherwise, if you leave the prompt the same (in our case ‘an astronaut riding a horse in a supermarket’) it will force a horse to be in the masked area as it has no knowledge of the horse in the original image:

Inpaint Area

You’ll have two options here: whole picture and only masked. The option you select depends on what you are wanting to achieve when inpainting.

Here’s a diagram that provides you a visual idea of what is going on with the different options:

Here’s what it looks like when Stable Diffusion is generating the image:

Only Masked

Whole Picture

Whole Picture: Select this option if you want to strike overall consistency with the image. Think of it as a more controlled alternative to just simply sending the image to the img2img tab alone. Whole picture regenerates the entire image but only makes the changes to the area you masked.

This is helpful for removing small defects in the image that you want the Stable Diffusion to try again to clean up.

Note: If you are inpainting a high-res image, it will downsize the entire image to fit the resized dimensions if you select whole picture.

Only Masked: Rather than generating the entire image from scratch and applying changes to only the masked area (like the whole picture option above), only masked generates a full-scale image of the masked area. Once generated, it is then scaled appropriately and applied to the image. This is extremely powerful for fixing small details in an image without losing quality or when working with high resolution photos.

Only Masked Padding, Pixels: This adds space between the mask and the generated image.

This is helpful to provide the model with additional context of what is around the masked area when generating results. Here is a comparison between 32 and 128 pixel padding:

32 pixel padding

128 pixel padding

Adding too much padding however may reduce quality of the newly inpainted section.

Sampling Method & Steps

For most cases, the sampling method and steps can remain the same while inpainting. However, if you want further detail in an area, for example, a well-defined face, increasing the number of steps may help.

However, the number of steps necessary will depend on the sampling method – some may require more steps than others to achieve the desired results.

Note: An incredibly high step count (i.e., 150+) doesn’t mean the image will be better. Too many steps may cause the image to look bad as the model may begin to add in details that weren’t there before.

Resizing

When you send an image to the inpaint tab, it will automatically set the resolution to the same size as the original image.

But, you may want to change the size of the image dimensions depending on the inpaint area.

Inpainting High-Resolution Photos

If the image you are inpainting is high-resolution (i.e., 3072 × 2048), then you cannot inpaint the entire image at this resolution as it will likely result in memory errors on your machine.

Instead, you should set the width and height that best fits the model (i.e., 512×512, 512×768 if you are using Stable Diffusion 1.x). Automatic 1111 will automatically scale the image accordingly when inpainting is complete.

Always be sure to change the resolution and set the inpaint area to only masked when working with high-resolution images. Setting it to whole picture will downsize the entire image to fit the resized dimensions. This will result in a loss of quality.

Inpainting Small Images

If the image you are inpainting is small (i.e., 512×512), then you can keep the resolution the same and not worry about performance issues.

CFG Scale

CFG is the classifier-free guidance scale. This informs the model how closely it should follow the text prompt. It may be more or less literal. CFG should be set somewhere between 6-15, as too low and too high produces poor images. The default value of 7 is usually a good starting point.

Here’s an entire article on the topic should you want to learn more.

Denoising Strength

This value informs the model how closely it should follow the existing image or how much it should create a new image. At 0, no changes will happen, and at 1, a much different image will be generated. Think of this as how creative you want the model to be when inpainting.

However, this is just the 10,000′ view of this parameter. Under the hood, this sliding scale effectively controls how much noise should be added to the masked area and subsequently how much work the model will have to do on the other end to denoise it. A higher value will result in more steps to add noise and denoise an image, which equates to a longer generation time.

In layman’s terms, the default value of 0.7 works well 😉

However, if you are seeing your prompt show up in the small selected area, then you may want to lower the denoising area. Here’s a look at various noising levels:

Denoising .20

Denoising .50

Denoising .80

Denoising 1.0

Then we could slide this down to a lower value to add less noise and subsequently make it easier for the model to know that an astronaut’s helmet was originally there and not a horse’s head.

Batch Count & Batch Size

Increase these if you want to compare multiple inpaintings side by side. Personally, I like to set the batch count to 4 and the batch size to 1 as it won’t take long to generate but gives me a good idea of the different results that I can choose from.

Seed

Keep this set at -1 as it will allow the model to generate new images for the masked area.

If you liked the result of one inpainting and want to try and get the same result again, you can set the seed to a specific seed value.

Where Inpainting Goes Wrong

Inevitably, errors arise when inpainting. Here are some of the most common and what causes them:

An Additional Object Being Added: Seeing your prompt showing up in a tiny masked area? Then you will need to either switch to an inpainting model, change the denoising strength, change the prompt to be more specific, set the masked content to original, and set inpaint area to whole picture if possible.
Lines Around the Masked Area: This always happens when not using an inpainting model. Increasing the mask blur can help to soften the edges of the mask and reduce the lines.
Think in Progressive Generations: When working with Stable Diffusion models, you may need to sometimes work in progressive generations. For example, if you are trying to inpaint a face, you may need to first inpaint the face, then the eyes, then the mouth, etc. Or if you want your character to hold a sword and they are holding something that resembles a sword, then use that and adapt it further. Increasing the batch count can be your friend here.
Change Your Prompt: When inpainting, the prompt you carried over from the txt2img tab is a good starting point – but don’t be afraid to change it. Tell the prompt what you want to achieve in the masked area. If you aren’t sure, use the masked padding to give the model more context of what is around the masked area.