Stable Diffusion Tutorial: From Install to First Image

The first time I generated an image with Stable Diffusion, I spent six hours fighting with CUDA drivers and virtual environments—only to produce something that looked like a fever dream painted by a broken printer. Today, the process is dramatically simpler.

Stable Diffusion remains the most powerful open-source image generation model you can run locally. Unlike DALL-E or Midjourney, there’s no subscription, no content restrictions (beyond what you impose), and no sending your prompts to someone else’s servers. You own the entire process.

This tutorial will take you from zero to generating your first images. I’ll cover installation on Windows, Mac, and Linux, explain how to write effective prompts, and share the tips I wish I’d known from day one.

What is Stable Diffusion?

Stable Diffusion is a deep learning text-to-image model released by Stability AI in 2022. It takes a text description (called a “prompt”) and generates an image that matches—or attempts to match—that description.

Why Stable Diffusion?

Open source and local: You run it on your own hardware. No API costs, no usage limits, no moderation (beyond what you choose).

Customizable: Thousands of custom models, styles, and extensions are available. Want anime art? Photorealistic portraits? Specific artists’ styles? Custom models exist for all of these.

Private: Your prompts never leave your computer. Generate whatever you want without it hitting someone else’s logs.

Free: Once you’ve invested in hardware (which you may already have), there are no ongoing costs.

How It Works (Simplified)

Stable Diffusion uses a technique called “diffusion” that works by starting with random noise and gradually removing it to form an image. The text encoder (CLIP) translates your prompt into a format the model understands, and the model generates an image that matches.

Think of it like a sculptor starting with a rough block and progressively refining it based on instructions.

Current Versions

As of 2026, the main versions you’ll encounter are:

SDXL (Stable Diffusion XL): The flagship model. Best quality, requires more VRAM (8GB+ recommended)
SD 1.5: The classic version. Lower VRAM requirements (4GB+), massive ecosystem of fine-tuned models
SD 3.0: Newer architecture with improved prompt following, still maturing in ecosystem
FLUX: Latest open-source model from Black Forest Labs (Stability AI spinoff), excellent quality

For beginners, I recommend starting with SDXL if your GPU supports it, or SD 1.5 if you’re on older hardware.

System Requirements

Let’s make sure your computer can run Stable Diffusion before we start installing.

Minimum Requirements

GPU (Recommended):

NVIDIA GPU with 6GB+ VRAM (8GB+ for SDXL)
RTX 3060 or better recommended
AMD GPUs work but with more complex setup
Apple Silicon Macs (M1/M2/M3) work great

CPU-Only (Possible but slow):

16GB+ RAM
Expect 5-15 minutes per image instead of seconds

Storage:

10GB+ for base installation
50GB+ if you want multiple models

RAM:

16GB minimum
32GB recommended for SDXL

Checking Your GPU

On Windows, open Command Prompt and run:

nvidia-smi

This shows your GPU model and VRAM. Look for the “Total memory” line.

On Mac, Apple Silicon chips (M1, M2, M3, M4) all work with Stable Diffusion using Core ML or MPS (Metal Performance Shaders).

Which Hardware for Which Model

Your VRAM	Recommended Model
4GB	SD 1.5 (with optimizations)
6GB	SD 1.5 (comfortable)
8GB	SDXL (with optimizations)
12GB+	SDXL (full quality), FLUX
Apple M1/M2/M3	SDXL works well

Installation Guide

There are several ways to run Stable Diffusion. I’ll cover the most popular option: AUTOMATIC1111’s Stable Diffusion WebUI. It’s free, feature-rich, and has the largest community.

Windows Installation

Step 1: Install Python 3.10.x

Download from python.org. During installation, make sure to check “Add Python to PATH.”

Verify installation:

python --version

You should see Python 3.10.x.

Step 2: Install Git

Download from git-scm.com. Use default installation options.

Step 3: Clone the WebUI Repository

Open Command Prompt and navigate to where you want to install (e.g., C:\):

cd C:\
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

Step 4: Download a Model

Models are large files (2-7GB). Download SDXL from Hugging Face or SD 1.5 from Hugging Face.

Place the downloaded .safetensors file in:

stable-diffusion-webui/models/Stable-diffusion/

Step 5: Launch

cd stable-diffusion-webui
webui-user.bat

First launch takes 10-20 minutes as it downloads dependencies. Once complete, a URL will appear (usually http://127.0.0.1:7860). Open this in your browser.

Mac Installation (Apple Silicon)

Step 1: Install Homebrew (if not already installed)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Step 2: Install Requirements

brew install cmake protobuf rust python@3.10 git wget

Step 3: Clone and Run

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
./webui.sh

For Apple Silicon, the script automatically uses MPS (Metal) acceleration.

Step 4: Download Model

Same as Windows—download a model and place it in models/Stable-diffusion/.

Linux Installation

Ubuntu/Debian:

sudo apt update
sudo apt install git python3.10 python3.10-venv python3-pip

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
./webui.sh

With NVIDIA GPU, ensure you have CUDA drivers installed:

nvidia-smi

If this shows your GPU, you’re set. If not, install NVIDIA drivers first.

Alternative: ComfyUI

If AUTOMATIC1111’s WebUI feels too slow or complex, consider ComfyUI. It uses a node-based interface that’s more visual and often faster. The installation process is similar.

Your First Image

You’ve got it running. Let’s generate something.

The Interface

AUTOMATIC1111’s WebUI has two main modes:

txt2img: Generate images from text prompts
img2img: Modify existing images based on prompts

We’ll start with txt2img.

Your First Prompt

In the prompt box, type something simple:

A serene mountain landscape at sunset, golden light, snow-capped peaks, reflection in a calm lake, photorealistic

Leave all settings at default for now. Click “Generate.”

Your first image should appear in 10-60 seconds depending on your hardware.

Understanding the Results

Your first image probably isn’t perfect. That’s normal. Image generation is iterative—you tweak prompts and settings until you get what you want.

Each generation produces a seed number. If you like an image and want variations, note the seed and reuse it with slightly modified prompts.

Settings That Matter (For Beginners)

Sampling steps: How many refinement iterations. Default 20-30 is usually fine. Higher = slower but sometimes better.

CFG Scale (Classifier-Free Guidance): How strictly to follow your prompt. 7-12 is typical. Higher = more literal but can look artificial.

Sampler: The algorithm used. “Euler a” or “DPM++ 2M Karras” are good defaults.

Size: 512x512 for SD 1.5, 1024x1024 for SDXL. Other sizes may work but can degrade quality.

Understanding Prompts

Prompts are how you communicate with Stable Diffusion. Better prompts = better images.

Prompt Structure

Good prompts typically include:

Subject: What you’re creating (a woman, a landscape, a car)
Style: The artistic approach (photorealistic, oil painting, anime)
Lighting: How it’s lit (golden hour, soft lighting, neon)
Details: Specific elements (red hair, foggy atmosphere, vintage)
Quality modifiers: Technical qualities (highly detailed, 8K, professional photography)

Example:

Portrait of a young woman with flowing red hair, soft studio lighting, bokeh background, highly detailed, professional photography, 8K

Negative Prompts

Tell the model what to avoid. In the negative prompt box:

blurry, low quality, deformed hands, extra fingers, bad anatomy, distorted face

Negative prompts are often as important as positive prompts.

Prompt Weights

Emphasize specific elements using syntax:

(important word) - slight emphasis
((important word)) - more emphasis
(word:1.3) - specific weight (1.0 is default)
(unwanted:0.5) - reduce importance

Example:

A ((magical forest)), soft morning light, (fairy tale atmosphere:1.2), mist, ethereal

Common Prompt Mistakes

Too vague: “A nice picture” tells the model nothing. Be specific.

Too long: After 75 tokens, prompts may be truncated. Focus on what matters.

Conflicting instructions: “A day and night scene” confuses the model.

Forgetting quality modifiers: Always include quality terms unless you’re going for a specific aesthetic.

Advanced Settings

Once you’re comfortable with basics, these settings help you get exactly what you want.

High-Resolution Fix (Hires. fix)

SDXL native resolution is 1024x1024. Want larger? Enable “Hires. fix” which generates at lower resolution, then upscales intelligently. This avoids composition issues that occur when generating at high resolution directly.

Img2Img

Start with an existing image and modify it based on a prompt. Great for:

Refining images you generated
Changing styles while keeping composition
Adding or removing elements

Denoising strength controls how much the image changes:

0.3-0.5: Minor modifications
0.6-0.8: Significant changes while keeping structure
0.9+: Almost complete regeneration

ControlNet

An extension that adds control over composition. Use sketches, edge detection, or pose estimation to guide generation. Essential for precise work.

Install from the Extensions tab in WebUI.

LoRAs

Small files (10-200MB) that modify the model’s output—adding specific characters, styles, or concepts. Thousands are available on Civitai.

Place in models/Lora/ and reference in prompts using <lora:filename:weight>.

Best Practices and Tips

After generating thousands of images, here’s what I’ve learned:

Iterate, Don’t Perfect

Generate batches of 4-8 images. Most won’t be exactly right. Pick the best, note its seed, adjust prompts, and generate again. This is faster than trying to write the perfect prompt upfront.

Save Your Workflows

When you find good prompt/setting combinations, save them. The WebUI supports styles and presets.

Learn from Others

Civitai and PromptHero show prompts used to create popular images. Study what works.

Respect Composition

Stable Diffusion struggles with:

Hands and fingers (though newer models are better)
Text in images
Counting specific numbers of objects
Very complex multi-character scenes

Work around limitations rather than fighting them.

Build a Prompt Library

Keep a document of prompts that work well. Reuse and modify rather than starting from scratch. For a comprehensive starting library, our 120 Nano Banana prompts offer production-ready templates across multiple categories—perfect for jumpstarting your Stable Diffusion workflow.

Troubleshooting Common Issues

”CUDA out of memory”

Your GPU doesn’t have enough VRAM. Try:

Reducing image size
Enabling --lowvram or --medvram in webui-user.bat/sh
Closing other GPU-intensive applications
Using xformers (add --xformers to launch arguments)

“No module named X”

Dependency issues. Try:

pip install -r requirements.txt

Or delete the venv folder and let it reinstall.

Black or corrupted images

Usually a VRAM or precision issue. Try:

Adding --no-half to launch arguments
Reducing batch size to 1
Updating GPU drivers

WebUI won’t start

Check Python version (need 3.10.x specifically), make sure Git is installed, and verify you have enough disk space.

Slow generation

Enable xformers: --xformers
On NVIDIA, try --opt-sdp-attention
Consider ComfyUI for better performance
Close background applications

FAQ

Is Stable Diffusion free to use?

Yes, completely. The software is open source and models are freely available. Your only cost is the hardware to run it. Commercial use is allowed under the model’s license, though always check specific model licenses if using fine-tuned versions.

Can I run Stable Diffusion without a GPU?

Technically yes, using CPU-only mode. But expect 5-15 minutes per image instead of seconds. It’s usable for occasional experimentation but not practical for regular use. Apple Silicon Macs are a good GPU alternative—they work well with Stable Diffusion.

Which is better, Stable Diffusion or Midjourney?

Different tools for different needs. Midjourney produces stunning results with minimal prompting but costs money and runs through Discord. Stable Diffusion is free, runs locally, allows more control and customization, but requires more setup and prompt knowledge. Many serious AI artists use both.

Are there content restrictions?

Locally, you can generate anything your model allows. The base models have some built-in restrictions, but fine-tuned community models exist with various policies. Obviously, don’t create harmful or illegal content. Use responsibly.

How do I get better at prompting?

Practice and study. Generate hundreds of images. Look at prompts used for images you admire on Civitai or PromptHero. Experiment with different structures. Save what works. It’s a skill that develops over time.

Conclusion

Running Stable Diffusion locally gives you more control, privacy, and creative freedom than any cloud-based alternative. The initial setup takes effort, but once running, you have a tool that rivals services costing hundreds per month.

Start simple. Generate images with basic prompts. Experiment with settings. Study what works. Over time, you’ll develop an intuition for prompts and workflows that produce exactly what you envision.

The creative possibilities are genuinely infinite. Now go create something.

Want to explore more AI creative tools? Check out our guides on best AI image generators, AI voice cloning, and text-to-video AI.