Stable Diffusion Tutorial: From Install to First Image
Learn how to install and use Stable Diffusion in 2026. This step-by-step tutorial covers installation on Windows/Mac/Linux, your first image generation, and prompt writing tips.
The first time I generated an image with Stable Diffusion, I spent six hours fighting with CUDA drivers and virtual environments—only to produce something that looked like a fever dream painted by a broken printer. Today, the process is dramatically simpler.
Stable Diffusion remains the most powerful open-source image generation model you can run locally. Unlike DALL-E or Midjourney, there’s no subscription, no content restrictions (beyond what you impose), and no sending your prompts to someone else’s servers. You own the entire process.
This tutorial will take you from zero to generating your first images. I’ll cover installation on Windows, Mac, and Linux, explain how to write effective prompts, and share the tips I wish I’d known from day one.
What is Stable Diffusion?
Stable Diffusion is a deep learning text-to-image model released by Stability AI in 2022. It takes a text description (called a “prompt”) and generates an image that matches—or attempts to match—that description.
Why Stable Diffusion?
Open source and local: You run it on your own hardware. No API costs, no usage limits, no moderation (beyond what you choose).
Customizable: Thousands of custom models, styles, and extensions are available. Want anime art? Photorealistic portraits? Specific artists’ styles? Custom models exist for all of these.
Private: Your prompts never leave your computer. Generate whatever you want without it hitting someone else’s logs.
Free: Once you’ve invested in hardware (which you may already have), there are no ongoing costs.
How It Works (Simplified)
Stable Diffusion uses a technique called “diffusion” that works by starting with random noise and gradually removing it to form an image. The text encoder (CLIP) translates your prompt into a format the model understands, and the model generates an image that matches.
Think of it like a sculptor starting with a rough block and progressively refining it based on instructions.
Current Versions
As of 2026, the main versions you’ll encounter are:
- SDXL (Stable Diffusion XL): The flagship model. Best quality, requires more VRAM (8GB+ recommended)
- SD 1.5: The classic version. Lower VRAM requirements (4GB+), massive ecosystem of fine-tuned models
- SD 3.0: Newer architecture with improved prompt following, still maturing in ecosystem
- FLUX: Latest open-source model from Black Forest Labs (Stability AI spinoff), excellent quality
For beginners, I recommend starting with SDXL if your GPU supports it, or SD 1.5 if you’re on older hardware.
System Requirements
Let’s make sure your computer can run Stable Diffusion before we start installing.
Minimum Requirements
GPU (Recommended):
- NVIDIA GPU with 6GB+ VRAM (8GB+ for SDXL)
- RTX 3060 or better recommended
- AMD GPUs work but with more complex setup
- Apple Silicon Macs (M1/M2/M3) work great
CPU-Only (Possible but slow):
- 16GB+ RAM
- Expect 5-15 minutes per image instead of seconds
Storage:
- 10GB+ for base installation
- 50GB+ if you want multiple models
RAM:
- 16GB minimum
- 32GB recommended for SDXL
Checking Your GPU
On Windows, open Command Prompt and run:
nvidia-smi
This shows your GPU model and VRAM. Look for the “Total memory” line.
On Mac, Apple Silicon chips (M1, M2, M3, M4) all work with Stable Diffusion using Core ML or MPS (Metal Performance Shaders).
Which Hardware for Which Model
| Your VRAM | Recommended Model |
|---|---|
| 4GB | SD 1.5 (with optimizations) |
| 6GB | SD 1.5 (comfortable) |
| 8GB | SDXL (with optimizations) |
| 12GB+ | SDXL (full quality), FLUX |
| Apple M1/M2/M3 | SDXL works well |
Installation Guide
There are several ways to run Stable Diffusion. I’ll cover the most popular option: AUTOMATIC1111’s Stable Diffusion WebUI. It’s free, feature-rich, and has the largest community.
Windows Installation
Step 1: Install Python 3.10.x
Download from python.org. During installation, make sure to check “Add Python to PATH.”
Verify installation:
python --version
You should see Python 3.10.x.
Step 2: Install Git
Download from git-scm.com. Use default installation options.
Step 3: Clone the WebUI Repository
Open Command Prompt and navigate to where you want to install (e.g., C:\):
cd C:\
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
Step 4: Download a Model
Models are large files (2-7GB). Download SDXL from Hugging Face or SD 1.5 from Hugging Face.
Place the downloaded .safetensors file in:
stable-diffusion-webui/models/Stable-diffusion/
Step 5: Launch
cd stable-diffusion-webui
webui-user.bat
First launch takes 10-20 minutes as it downloads dependencies. Once complete, a URL will appear (usually http://127.0.0.1:7860). Open this in your browser.
Mac Installation (Apple Silicon)
Step 1: Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Step 2: Install Requirements
brew install cmake protobuf rust python@3.10 git wget
Step 3: Clone and Run
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
./webui.sh
For Apple Silicon, the script automatically uses MPS (Metal) acceleration.
Step 4: Download Model
Same as Windows—download a model and place it in models/Stable-diffusion/.
Linux Installation
Ubuntu/Debian:
sudo apt update
sudo apt install git python3.10 python3.10-venv python3-pip
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
./webui.sh
With NVIDIA GPU, ensure you have CUDA drivers installed:
nvidia-smi
If this shows your GPU, you’re set. If not, install NVIDIA drivers first.
Alternative: ComfyUI
If AUTOMATIC1111’s WebUI feels too slow or complex, consider ComfyUI. It uses a node-based interface that’s more visual and often faster. The installation process is similar.
Your First Image
You’ve got it running. Let’s generate something.
The Interface
AUTOMATIC1111’s WebUI has two main modes:
- txt2img: Generate images from text prompts
- img2img: Modify existing images based on prompts
We’ll start with txt2img.
Your First Prompt
In the prompt box, type something simple:
A serene mountain landscape at sunset, golden light, snow-capped peaks, reflection in a calm lake, photorealistic
Leave all settings at default for now. Click “Generate.”
Your first image should appear in 10-60 seconds depending on your hardware.
Understanding the Results
Your first image probably isn’t perfect. That’s normal. Image generation is iterative—you tweak prompts and settings until you get what you want.
Each generation produces a seed number. If you like an image and want variations, note the seed and reuse it with slightly modified prompts.
Settings That Matter (For Beginners)
Sampling steps: How many refinement iterations. Default 20-30 is usually fine. Higher = slower but sometimes better.
CFG Scale (Classifier-Free Guidance): How strictly to follow your prompt. 7-12 is typical. Higher = more literal but can look artificial.
Sampler: The algorithm used. “Euler a” or “DPM++ 2M Karras” are good defaults.
Size: 512x512 for SD 1.5, 1024x1024 for SDXL. Other sizes may work but can degrade quality.
Understanding Prompts
Prompts are how you communicate with Stable Diffusion. Better prompts = better images.
Prompt Structure
Good prompts typically include:
- Subject: What you’re creating (a woman, a landscape, a car)
- Style: The artistic approach (photorealistic, oil painting, anime)
- Lighting: How it’s lit (golden hour, soft lighting, neon)
- Details: Specific elements (red hair, foggy atmosphere, vintage)
- Quality modifiers: Technical qualities (highly detailed, 8K, professional photography)
Example:
Portrait of a young woman with flowing red hair, soft studio lighting, bokeh background, highly detailed, professional photography, 8K
Negative Prompts
Tell the model what to avoid. In the negative prompt box:
blurry, low quality, deformed hands, extra fingers, bad anatomy, distorted face
Negative prompts are often as important as positive prompts.
Prompt Weights
Emphasize specific elements using syntax:
(important word)- slight emphasis((important word))- more emphasis(word:1.3)- specific weight (1.0 is default)(unwanted:0.5)- reduce importance
Example:
A ((magical forest)), soft morning light, (fairy tale atmosphere:1.2), mist, ethereal
Common Prompt Mistakes
Too vague: “A nice picture” tells the model nothing. Be specific.
Too long: After 75 tokens, prompts may be truncated. Focus on what matters.
Conflicting instructions: “A day and night scene” confuses the model.
Forgetting quality modifiers: Always include quality terms unless you’re going for a specific aesthetic.
Advanced Settings
Once you’re comfortable with basics, these settings help you get exactly what you want.
High-Resolution Fix (Hires. fix)
SDXL native resolution is 1024x1024. Want larger? Enable “Hires. fix” which generates at lower resolution, then upscales intelligently. This avoids composition issues that occur when generating at high resolution directly.
Img2Img
Start with an existing image and modify it based on a prompt. Great for:
- Refining images you generated
- Changing styles while keeping composition
- Adding or removing elements
Denoising strength controls how much the image changes:
- 0.3-0.5: Minor modifications
- 0.6-0.8: Significant changes while keeping structure
- 0.9+: Almost complete regeneration
ControlNet
An extension that adds control over composition. Use sketches, edge detection, or pose estimation to guide generation. Essential for precise work.
Install from the Extensions tab in WebUI.
LoRAs
Small files (10-200MB) that modify the model’s output—adding specific characters, styles, or concepts. Thousands are available on Civitai.
Place in models/Lora/ and reference in prompts using <lora:filename:weight>.
Best Practices and Tips
After generating thousands of images, here’s what I’ve learned:
Iterate, Don’t Perfect
Generate batches of 4-8 images. Most won’t be exactly right. Pick the best, note its seed, adjust prompts, and generate again. This is faster than trying to write the perfect prompt upfront.
Save Your Workflows
When you find good prompt/setting combinations, save them. The WebUI supports styles and presets.
Learn from Others
Civitai and PromptHero show prompts used to create popular images. Study what works.
Respect Composition
Stable Diffusion struggles with:
- Hands and fingers (though newer models are better)
- Text in images
- Counting specific numbers of objects
- Very complex multi-character scenes
Work around limitations rather than fighting them.
Build a Prompt Library
Keep a document of prompts that work well. Reuse and modify rather than starting from scratch.
Troubleshooting Common Issues
”CUDA out of memory”
Your GPU doesn’t have enough VRAM. Try:
- Reducing image size
- Enabling
--lowvramor--medvramin webui-user.bat/sh - Closing other GPU-intensive applications
- Using xformers (add
--xformersto launch arguments)
“No module named X”
Dependency issues. Try:
pip install -r requirements.txt
Or delete the venv folder and let it reinstall.
Black or corrupted images
Usually a VRAM or precision issue. Try:
- Adding
--no-halfto launch arguments - Reducing batch size to 1
- Updating GPU drivers
WebUI won’t start
Check Python version (need 3.10.x specifically), make sure Git is installed, and verify you have enough disk space.
Slow generation
- Enable xformers:
--xformers - On NVIDIA, try
--opt-sdp-attention - Consider ComfyUI for better performance
- Close background applications
FAQ
Is Stable Diffusion free to use?
Yes, completely. The software is open source and models are freely available. Your only cost is the hardware to run it. Commercial use is allowed under the model’s license, though always check specific model licenses if using fine-tuned versions.
Can I run Stable Diffusion without a GPU?
Technically yes, using CPU-only mode. But expect 5-15 minutes per image instead of seconds. It’s usable for occasional experimentation but not practical for regular use. Apple Silicon Macs are a good GPU alternative—they work well with Stable Diffusion.
Which is better, Stable Diffusion or Midjourney?
Different tools for different needs. Midjourney produces stunning results with minimal prompting but costs money and runs through Discord. Stable Diffusion is free, runs locally, allows more control and customization, but requires more setup and prompt knowledge. Many serious AI artists use both.
Are there content restrictions?
Locally, you can generate anything your model allows. The base models have some built-in restrictions, but fine-tuned community models exist with various policies. Obviously, don’t create harmful or illegal content. Use responsibly.
How do I get better at prompting?
Practice and study. Generate hundreds of images. Look at prompts used for images you admire on Civitai or PromptHero. Experiment with different structures. Save what works. It’s a skill that develops over time.
Conclusion
Running Stable Diffusion locally gives you more control, privacy, and creative freedom than any cloud-based alternative. The initial setup takes effort, but once running, you have a tool that rivals services costing hundreds per month.
Start simple. Generate images with basic prompts. Experiment with settings. Study what works. Over time, you’ll develop an intuition for prompts and workflows that produce exactly what you envision.
The creative possibilities are genuinely infinite. Now go create something.
Want to explore more AI creative tools? Check out our guides on best AI image generators, AI voice cloning, and text-to-video AI.