← Back to homepage
Diffusion Detective: Interpretable Stable Diffusion
A research system that extracts cross-attention maps, applies semantic steering via CLIP embedding arithmetic, and generates natural language explanations for every image generated.
Stable Diffusion 2.1 PyTorch FastAPI React CLIP GPT-4o-mini Python 67% / JS 31%
Diffusion Detective interface
100%
Attention extraction accuracy (MAE < 10⁻⁶)
94%
Semantic steering success rate
2.7%
Overhead from attention extraction
250ms
Narrative generation (GPT-4o-mini)

What It Does

Diffusion Detective makes Stable Diffusion interpretable. For any generated image it provides three capabilities that standard diffusion pipelines lack entirely:

Zero-Approximation Attention Extraction

Cross-attention probabilities are extracted directly from the UNet's transformer layers at every denoising timestep: no approximation, validated via probability sum = 1.0 (MAE < 10⁻⁶). Shows exactly which tokens the model attends to while drawing each part of the image.

Semantic Steering via CLIP Algebra

Edit images after the prompt, without rerunning: by adding or subtracting CLIP embedding vectors to the latent space during denoising. E.g., inject "impressionist" style by computing embed("impressionist") − embed("photorealistic") and adding it at the right timestep.

LLM-Powered Explanations

GPT-4o-mini reads the raw attention logs and writes a three-stage narrative: Setup (what the model focused on first), Comparison (how intervention changed attention), Insight (what it means). Human-readable transparency.

How It Works

Attention extraction hooks into the UNet's attention computation during each denoising step:

# During denoising step t attention_probs = softmax(Q @ K.T / sqrt(d_k)) # [H, W, num_tokens] # Hooked and stored at every timestep: zero approximation

Semantic steering uses CLIP embedding arithmetic to compute a steering vector applied mid-generation:

steering_vector = clip_encode(attribute) - clip_encode(concept) latents_t = latents_t + steering_vector * strength # Optimal window: steps 40–20 (semantic attributes form before fine details)

The React frontend shows side-by-side comparison of the baseline and steered image, with the attention heatmap and narrative overlaid.

Architecture

Why This Matters

Standard diffusion models are black boxes: users have no insight into why an image looks the way it does. Diffusion Detective is a prototype for a future where generative AI is auditable: you can see what the model attended to, steer it without retraining, and read a plain-English explanation of its decisions. This is foundational work toward trustworthy generative AI.