GPU Benchmarks
Performance comparisons for AI image generation with open source models
🙏 Huge shout-out to vladmandic for creating SD.Next and making this benchmark data possible!
What is it/s? Iterations per second (it/s) measures how fast a GPU generates AI images. Higher numbers = faster generation. A GPU with 20 it/s is twice as fast as one with 10 it/s. Learn more ↓
Compare GPUs
Filtered by SDXL - Showing only benchmarks using SDXL models for fair comparison
⚠️ Note: 10 selected GPUs have no SDXL benchmarks and are hidden from the chart (GeForce RTX 5090 D, GeForce RTX 4090, GeForce RTX 5090, GeForce RTX 4080 SUPER, GeForce RTX 4080, Radeon RX 7900 XTX, Radeon RX 7900 XT, GeForce RTX 5060 Ti, GeForce RTX 5060, Radeon RX 6700 XT)
Select GPUs below to compare
Select GPUs to Compare (10 selected)
NVIDIA(51 models)
AMD(28 models)
Intel Arc(1 models)
Benchmark Details
NVIDIA GeForce RTX 5090 D
No SDXL benchmarks found for this GPU
NVIDIA GeForce RTX 4090
No SDXL benchmarks found for this GPU
NVIDIA GeForce RTX 5090
No SDXL benchmarks found for this GPU
NVIDIA GeForce RTX 4080 SUPER
No SDXL benchmarks found for this GPU
NVIDIA GeForce RTX 4080
No SDXL benchmarks found for this GPU
AMD Radeon RX 7900 XTX
No SDXL benchmarks found for this GPU
AMD Radeon RX 7900 XT
No SDXL benchmarks found for this GPU
NVIDIA GeForce RTX 5060 Ti
No SDXL benchmarks found for this GPU
NVIDIA GeForce RTX 5060
No SDXL benchmarks found for this GPU
AMD Radeon RX 6700 XT
No SDXL benchmarks found for this GPU
Understanding Iterations Per Second (it/s)
Iterations per second (it/s) is the key metric for measuring GPU performance in AI image generation. It represents how many denoising steps the GPU can process each second when generating images.
Higher = Faster
A GPU with 20 it/s generates images twice as fast as one with 10 it/s. For a 20-step image, a 20 it/s GPU takes 1 second, while 10 it/s takes 2 seconds.
Why the Range?
Performance varies based on batch size, resolution, model complexity, and settings. The range shows minimum to maximum performance across different configurations.
Technical Factors
🔷 Tensor Cores
Modern NVIDIA GPUs (RTX 20-series and newer) include specialized Tensor Cores designed specifically for AI workloads. These cores excel at the matrix multiplication operations that power diffusion models, delivering 2-3x faster performance compared to traditional CUDA cores. AMD GPUs use Matrix Cores (RDNA 3) for similar acceleration, while older GPUs rely solely on standard compute units.
💾 VRAM Requirements
VRAM (Video Memory) is critical for AI image generation. The entire model must fit in VRAM for optimal performance:
- SD 1.5: ~4GB minimum (6GB recommended)
- SDXL: ~8GB minimum (12GB recommended for ControlNet/refiner)
- Flux: ~12GB minimum (16GB+ recommended)
⚠️ Critical: If your model doesn't fit in VRAM, the GPU will swap data to system RAM, causing a catastrophic 10-50x slowdown. An 8GB GPU trying to run a 12GB model will be drastically slower than a properly sized card, even if the 8GB GPU has faster tensor cores.
⚡ Optimization Techniques
To reduce VRAM usage without major quality loss: xformers (memory-efficient attention), quantization (8-bit/4-bit models),model offloading (CPU/disk swapping for parts of the model), and VAE tiling (for high resolutions). However, these techniques may reduce performance by 10-30% compared to running the full model in VRAM.
Never Miss a Tutorial
Get the best AI image tutorials and tool reviews—no spam, just 1–2 helpful emails a month.