Calibri: Enhancing Diffusion Transformers via
Parameter-Efficient Calibration

Danil Tokhchukov^1,* Aysel Mirzoeva¹ Andrey Kuznetsov^2,3 Konstantin Sobolev^1,2,*,†

¹MSU ²FusionBrain Lab ³AXXX
CVPR 2026

^*The authors contributed equally.
^†Project lead, correspondence: ksobolev.info@gmail.com

Calibri — a parameter-efficient method for diffusion transformer calibration. By optimizing only ~10² parameters, Calibri consistently enhances generation quality across SOTA models.

Abstract

In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that introducing a single learned scaling parameter can significantly improve the performance of DiT blocks. Building on this insight, we propose Calibri, a parameter-efficient approach that optimally calibrates DiT components to elevate generative quality.

Calibri frames DiT calibration as a black-box reward optimization problem, efficiently solved using a Covariance Matrix Adaptation Evolution Strategy (CMA-ES), modifying just ~10² parameters. Despite its lightweight design, Calibri consistently improves performance across various text-to-image models and notably reduces the number of inference steps required — all while maintaining high-quality outputs.

Method

Key Observation: DiT blocks are not fully utilized during inference. A single scalar multiplier applied to the output of a block is sufficient to meaningfully change the quality of generated images — no re-training required. This motivated Calibri: find the optimal set of per-block scaling coefficients.

Motivational Experiment: Our findings reveal that the contribution of DiT blocks is not fully optimized. We demonstrate that their performance can be enhanced through a straightforward output scaling using a scalar multiplier.

The DiT architecture processes visual and text tokens through a sequence of transformer blocks. Calibri inserts a learnable scalar at each block output, which acts as a fine-grained "volume knob" for that block's contribution to the denoising trajectory.

Since the search space has only ~100 dimensions, we solve the calibration problem with CMA-ES — an evolutionary black-box optimizer that requires no gradient signal through the network. The reward is evaluated directly (e.g., HPSv3 score), making Calibri applicable to any differentiable and non-differentiable preference model.

CMA-ES evolutionary optimization scheme — Figure 1. The *Calibri* optimization loop. CMA-ES iterates over candidate scaling vectors, evaluates each via a reward model, and updates the distribution to converge on optimal calibration coefficients.

Calibri Ensemble

We generalize Calibri to the case where multiple model outputs participate in inference, such as Classifier-Free Guidance (CFG). Specifically, we optimize for an ensemble of \(N\) models simultaneously, where the combined output \( F \) is:

\[ F^{\{c_i\}_{i=1}^N}(x, t, p) = \sum_{i=1}^N \omega_i f^{s_i}_{\theta}(x, t, p \mid \emptyset) \]

where \( \omega_i \) denotes the weight assigned to the \( i \)-th model and \( f^{s_i}_{\theta} \) represents the model calibrated with parameter set \( s_i \). As shown below, Calibri at 15 NFE surpasses the base model running at 50 NFE.

HPSv3 vs Inference Steps — Figure 2. *Calibri* vs. Base Model across inference steps. The x-axis shows the number of function evaluations (NFE), and the y-axis shows HPSv3 score. *Calibri* (upper curve) dominates the base model at every NFE budget — particularly impressive at low inference counts where it already exceeds the base model's ceiling.

Results

We apply Calibri to three SOTA text-to-image models: FLUX.1-dev, Stable Diffusion 3.5 Medium, and Qwen-Image. Calibri consistently improves all preference metrics (HPSv3, ImageReward, Q-Align) while simultaneously cutting inference cost — up to 2–3× fewer function evaluations.

Table 1: Quantitative evaluation of generation quality improvements across baseline models.
Model	Calibri	HPSv3 ↑	ImageReward ↑	Q-Align ↑	NFE ↓
FLUX.1-dev	✗	11.41	1.15	4.85	30
FLUX.1-dev	✓	13.48	1.18	4.88	15
SD-3.5 Medium	✗	11.15	1.10	4.74	80
SD-3.5 Medium	✓	14.10	1.17	4.91	30
Qwen Image	✗	11.26	1.16	4.55	100
Qwen Image	✓	12.95	1.18	4.73	30

Table 2: Human evaluation: *Calibri* vs. Baselines win rates %.
Methods	Overall Preference			Text Alignment
Methods	Calibri	Equal	Original	Calibri	Equal	Original
Flux	51.87	7.33	40.80	38.71	37.68	23.61
Qwen-Image	54.62	7.91	37.47	40.29	37.65	22.06

Qualitative Comparison — Figure 3. Qualitative comparisons on diverse prompts. *Calibri*-calibrated models (bottom rows) produce images with higher detail, better compositional coherence, and more vivid appearance compared to the base model outputs (top rows).

Combining Calibri with Alignment Methods

Calibri is not a replacement for RL-based alignment — it is composable with it. We show that Calibri can be applied on top of Flow-GRPO fine-tuned models to further push performance, with 10⁵ fewer parameters than the fine-tuned backbone. Crucially, Calibri allows targeting different metrics than the original fine-tuning objective.

Table 3: Comparison of *Calibri* and Flow-GRPO on SD-3.5M. *Calibri* achieves comparable performance with 10⁵ fewer parameters and can be combined with alignment methods to boost either the same or different target metrics.
Flow-GRPO	Calibri Target	HPSv3 ↑	PickScore ↑	Q-Align ↑	NFE ↓
✗	✗	11.15	22.40	4.74	80
✗	PickScore	12.47	23.13	4.91	30
PickScore	✗	12.67	23.78	4.92	80
PickScore	PickScore	12.96	23.93	4.85	30
GenEval	✗	10.16	22.22	4.69	80
GenEval	HPSv3	14.18	22.22	4.88	30

Alignment Results — Figure 4. Visual comparison of base model, Flow-GRPO aligned model, and the same model with *Calibri* applied. *Calibri* adds a further layer of refinement on top of the RL-trained checkpoint, improving both visual aesthetics and compositional accuracy simultaneously.

BibTeX

@article{tokhchukov2026calibri,
  title     = {Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration},
  author    = {Tokhchukov, Danil and Mirzoeva, Aysel and Kuznetsov, Andrey and Sobolev, Konstantin},
  journal   = {arXiv preprint arXiv:XXXX.XXXXX},
  year      = {2026}
}

Calibri: Enhancing Diffusion Transformers viaParameter-Efficient Calibration