GPU Benchmarks

Performance comparisons for AI image generation with open source models

🙏 Huge shout-out to vladmandic for creating SD.Next and making this benchmark data possible!

What is it/s? Iterations per second (it/s) measures how fast a GPU generates AI images. Higher numbers = faster generation. A GPU with 20 it/s is twice as fast as one with 10 it/s. Learn more ↓

Total Benchmarks

0

Select GPUs to see details

Avg Performance

—

SDXL only

Fastest Selected

None

Select a GPU

Compare GPUs

Model:

Filtered by SDXL - Showing only benchmarks using SDXL models for fair comparison

⚠️ Note: 10 selected GPUs have no SDXL benchmarks and are hidden from the chart (GeForce RTX 5090 D, GeForce RTX 4090, GeForce RTX 5090, GeForce RTX 4080 SUPER, GeForce RTX 4080, Radeon RX 7900 XTX, Radeon RX 7900 XT, GeForce RTX 5060 Ti, GeForce RTX 5060, Radeon RX 6700 XT)

Select GPUs below to compare

Select GPUs to Compare (10 selected)

Quick Select:

NVIDIA(51 models)

AMD(28 models)

Intel Arc(1 models)

Benchmark Details

NVIDIA GeForce RTX 5090 D

No SDXL benchmarks found for this GPU

NVIDIA GeForce RTX 4090

No SDXL benchmarks found for this GPU

NVIDIA GeForce RTX 5090

No SDXL benchmarks found for this GPU

NVIDIA GeForce RTX 4080 SUPER

No SDXL benchmarks found for this GPU

NVIDIA GeForce RTX 4080

No SDXL benchmarks found for this GPU

AMD Radeon RX 7900 XTX

No SDXL benchmarks found for this GPU

AMD Radeon RX 7900 XT

No SDXL benchmarks found for this GPU

NVIDIA GeForce RTX 5060 Ti

No SDXL benchmarks found for this GPU

NVIDIA GeForce RTX 5060

No SDXL benchmarks found for this GPU

AMD Radeon RX 6700 XT

No SDXL benchmarks found for this GPU

Understanding Iterations Per Second (it/s)

Iterations per second (it/s) is the key metric for measuring GPU performance in AI image generation. It represents how many denoising steps the GPU can process each second when generating images.

Higher = Faster

A GPU with 20 it/s generates images twice as fast as one with 10 it/s. For a 20-step image, a 20 it/s GPU takes 1 second, while 10 it/s takes 2 seconds.

Why the Range?

Performance varies based on batch size, resolution, model complexity, and settings. The range shows minimum to maximum performance across different configurations.

Technical Factors

🔷 Tensor Cores

Modern NVIDIA GPUs (RTX 20-series and newer) include specialized Tensor Cores designed specifically for AI workloads. These cores excel at the matrix multiplication operations that power diffusion models, delivering 2-3x faster performance compared to traditional CUDA cores. AMD GPUs use Matrix Cores (RDNA 3) for similar acceleration, while older GPUs rely solely on standard compute units.

💾 VRAM Requirements

VRAM (Video Memory) is critical for AI image generation. The entire model must fit in VRAM for optimal performance:

SD 1.5: ~4GB minimum (6GB recommended)
SDXL: ~8GB minimum (12GB recommended for ControlNet/refiner)
Flux: ~12GB minimum (16GB+ recommended)

⚠️ Critical: If your model doesn't fit in VRAM, the GPU will swap data to system RAM, causing a catastrophic 10-50x slowdown. An 8GB GPU trying to run a 12GB model will be drastically slower than a properly sized card, even if the 8GB GPU has faster tensor cores.

⚡ Optimization Techniques

To reduce VRAM usage without major quality loss: xformers (memory-efficient attention), quantization (8-bit/4-bit models),model offloading (CPU/disk swapping for parts of the model), and VAE tiling (for high resolutions). However, these techniques may reduce performance by 10-30% compared to running the full model in VRAM.

Never Miss a Tutorial

Get the best AI image tutorials and tool reviews—no spam, just 1–2 helpful emails a month.