Calibri: Enhancing Diffusion Transformers via
Parameter-Efficient Calibration

Danil Tokhchukov1,* Aysel Mirzoeva1 Andrey Kuznetsov2,3 Konstantin Sobolev1,2,*,†
1MSU    2FusionBrain Lab    3AXXX
CVPR 2026
*The authors contributed equally.
Project lead, correspondence: ksobolev.info@gmail.com
Calibri Visual Abstract

Calibri — a parameter-efficient method for diffusion transformer calibration. By optimizing only ~102 parameters, Calibri consistently enhances generation quality across SOTA models.


Abstract

In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that introducing a single learned scaling parameter can significantly improve the performance of DiT blocks. Building on this insight, we propose Calibri, a parameter-efficient approach that optimally calibrates DiT components to elevate generative quality.

Calibri frames DiT calibration as a black-box reward optimization problem, efficiently solved using a Covariance Matrix Adaptation Evolution Strategy (CMA-ES), modifying just ~102 parameters. Despite its lightweight design, Calibri consistently improves performance across various text-to-image models and notably reduces the number of inference steps required — all while maintaining high-quality outputs.


Method

Key Observation: DiT blocks are not fully utilized during inference. A single scalar multiplier applied to the output of a block is sufficient to meaningfully change the quality of generated images — no re-training required. This motivated Calibri: find the optimal set of per-block scaling coefficients.
IR Block
IR Scale

Motivational Experiment: Our findings reveal that the contribution of DiT blocks is not fully optimized. We demonstrate that their performance can be enhanced through a straightforward output scaling using a scalar multiplier.

The DiT architecture processes visual and text tokens through a sequence of transformer blocks. Calibri inserts a learnable scalar at each block output, which acts as a fine-grained "volume knob" for that block's contribution to the denoising trajectory.

Since the search space has only ~100 dimensions, we solve the calibration problem with CMA-ES — an evolutionary black-box optimizer that requires no gradient signal through the network. The reward is evaluated directly (e.g., HPSv3 score), making Calibri applicable to any differentiable and non-differentiable preference model.

CMA-ES evolutionary optimization scheme
Figure 1. The Calibri optimization loop. CMA-ES iterates over candidate scaling vectors, evaluates each via a reward model, and updates the distribution to converge on optimal calibration coefficients.

Calibri Ensemble

We generalize Calibri to the case where multiple model outputs participate in inference, such as Classifier-Free Guidance (CFG). Specifically, we optimize for an ensemble of \(N\) models simultaneously, where the combined output \( F \) is:

\[ F^{\{c_i\}_{i=1}^N}(x, t, p) = \sum_{i=1}^N \omega_i f^{s_i}_{\theta}(x, t, p \mid \emptyset) \]

where \( \omega_i \) denotes the weight assigned to the \( i \)-th model and \( f^{s_i}_{\theta} \) represents the model calibrated with parameter set \( s_i \). As shown below, Calibri at 15 NFE surpasses the base model running at 50 NFE.

HPSv3 vs Inference Steps
Figure 2. Calibri vs. Base Model across inference steps. The x-axis shows the number of function evaluations (NFE), and the y-axis shows HPSv3 score. Calibri (upper curve) dominates the base model at every NFE budget — particularly impressive at low inference counts where it already exceeds the base model's ceiling.

Results

We apply Calibri to three SOTA text-to-image models: FLUX.1-dev, Stable Diffusion 3.5 Medium, and Qwen-Image. Calibri consistently improves all preference metrics (HPSv3, ImageReward, Q-Align) while simultaneously cutting inference cost — up to 2–3× fewer function evaluations.

Table 1: Quantitative evaluation of generation quality improvements across baseline models.
Model Calibri HPSv3 ↑ ImageReward ↑ Q-Align ↑ NFE ↓
FLUX.1-dev 11.411.154.8530
13.48 1.18 4.88 15
SD-3.5 Medium 11.151.104.7480
14.10 1.17 4.91 30
Qwen Image 11.261.164.55100
12.95 1.18 4.73 30
Table 2: Human evaluation: Calibri vs. Baselines win rates %.
Methods Overall Preference Text Alignment
Calibri Equal Original Calibri Equal Original
Flux 51.87 7.33 40.80 38.71 37.68 23.61
Qwen-Image 54.62 7.91 37.47 40.29 37.65 22.06
Qualitative Comparison
Figure 3. Qualitative comparisons on diverse prompts. Calibri-calibrated models (bottom rows) produce images with higher detail, better compositional coherence, and more vivid appearance compared to the base model outputs (top rows).

Combining Calibri with Alignment Methods

Calibri is not a replacement for RL-based alignment — it is composable with it. We show that Calibri can be applied on top of Flow-GRPO fine-tuned models to further push performance, with 105 fewer parameters than the fine-tuned backbone. Crucially, Calibri allows targeting different metrics than the original fine-tuning objective.

Table 3: Comparison of Calibri and Flow-GRPO on SD-3.5M. Calibri achieves comparable performance with 105 fewer parameters and can be combined with alignment methods to boost either the same or different target metrics.
Flow-GRPO Calibri Target HPSv3 ↑ PickScore ↑ Q-Align ↑ NFE ↓
11.1522.404.7480
PickScore 12.47 23.13 4.91 30
PickScore 12.6723.784.9280
PickScore 12.96 23.93 4.85 30
GenEval 10.1622.224.6980
HPSv3 14.18 22.22 4.88 30
Alignment Results
Figure 4. Visual comparison of base model, Flow-GRPO aligned model, and the same model with Calibri applied. Calibri adds a further layer of refinement on top of the RL-trained checkpoint, improving both visual aesthetics and compositional accuracy simultaneously.

BibTeX

@article{tokhchukov2026calibri,
  title     = {Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration},
  author    = {Tokhchukov, Danil and Mirzoeva, Aysel and Kuznetsov, Andrey and Sobolev, Konstantin},
  journal   = {arXiv preprint arXiv:XXXX.XXXXX},
  year      = {2026}
}