Standard diffusion models use a fixed classifier-free guidance scale (CFG) throughout all denoising timesteps. This is simple but suboptimal: early timesteps (coarse structure) and late timesteps (fine details) have very different signal-to-noise regimes, yet both receive the same guidance push. The result is guidance artifacts: oversaturation, halos, and energy spikes: that become more visible at higher resolutions.
RectifiedHR asks: What does the latent energy landscape actually look like during sampling, and can we schedule guidance to keep it stable?
We profile the L2 norm of the latent tensor at each denoising step for a fixed set of prompts across multiple CFG schedules (constant, linear-increasing, linear-decreasing, cosine, step). This gives us an empirical energy trajectory: a diagnostic fingerprint of each schedule's stability.
Rather than a fixed CFG scale w, we define a time-varying schedule w(t).
The key insight is that linear-decreasing guidance: starting high to lock in structure
and tapering off as details solidify: best matches the natural energy dynamics of the denoising process.
We evaluate four schedule families:
We pair adaptive CFG with DPM++ 2M, a second-order multi-step solver that is particularly sensitive to guidance quality. This combination yields the most stable energy trajectories and the best perceptual results.
| Schedule | Stability Score ↑ | Consistency ↑ | Artifacts |
|---|---|---|---|
| Constant CFG (baseline) | 0.9821 | 0.9614 | Visible at 512+ |
| Linear-increasing | 0.9877 | 0.9702 | Moderate |
| Linear-decreasing (DPM++ 2M) | 0.9998 | 0.9873 | Minimal |
| Cosine | 0.9934 | 0.9801 | Low |
Stability score = 1 − normalized variance of latent energy across steps. Higher is better.
@article{sanjyal2025rectifiedhr,
title = {RectifiedHR: High-Resolution Diffusion via Energy Profiling and Adaptive Guidance Scheduling},
author = {Sanjyal, Ankit},
journal = {arXiv preprint arXiv:2507.09441},
year = {2025}
}