Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models

1MIT, 2Harvard
TL;DR: Equilibrium Matching (EqM) exceeds Flow Matching in generation quality, supports optimization-based sampling, and solves downstream tasks naturally.

Conceptual 2D Visualization. We compare the conceptual 2D dynamics of Equilibrium Matching and Flow Matching under 2 ground truths (marked by stars). EqM learns an invariant gradient that always converges to ground truths, whereas FM learns time-conditional velocity that only converges to ground truths at t=1. We also compare the real sampling process of two methods. Under identical step sizes and number of steps, EqM converges much faster than FM.


Abstract

We introduce Equilibrium Matching (EqM), a generative modeling framework built from an equilibrium dynamics perspective. EqM discards the non-equilibrium, time-conditional dynamics in traditional diffusion and flow-based generative models and instead learns the equilibrium gradient of an implicit energy landscape. Through this approach, we can adopt an optimization-based sampling process at inference time, where samples are obtained by gradient descent on the learned landscape with adjustable step sizes, adaptive optimizers, and adaptive compute. EqM surpasses the generation performance of diffusion/flow models empirically, achieving an FID of 1.90 on ImageNet 256$\times$256. EqM is also theoretically justified to learn and sample from the data manifold. Beyond generation, EqM is a flexible framework that naturally handles tasks including partially noised image denoising, OOD detection, and image composition. By replacing time-conditional velocities with a unified equilibrium landscape, EqM offers a tighter bridge between flow and energy-based models and a simple route to optimization-driven inference.


Generation Performance

Equilibrium Matching is theoretically guaranteed to learn the data manifold and produce samples from this manifold using gradient descent. Empirically, Equilibrium Matching achieves 1.90 FID on ImageNet 256x256 generation, outperforming existing diffusion and flow-based counterparts in generation quality. Equilibrium Matching also exhibits strong scaling behavior, exceeding the flow-based counterpart at all tested scales.



We present the generation process of our EqM-XL/2 model.


Table

Class-Conditional ImageNet 256x256 Generation. EqM-XL/2 achieves a 1.90 FID, surpassing other tested methods.


Optimization-Based Sampling

By learning a single equilibrium dynamics, Equilibrium Matching supports optimization-based sampling, where samples are obtained by gradient descent on the learned landscape. Unlike existing diffusion samplers that integrate along a prescribed trajectory, optimization-based sampling supports different step sizes and adaptive optimizers. Existing gradient optimization techniques such as Nesterov Accelerated Gradient can be naturally adopted to achieve better generation quality. Equilibrium Matching can also allocate inference-time compute adaptively. It adjusts the sampling steps for each sample independently based on gradient norm and can save up to 60% of function evaluations.



Left: Sampling with Nesterov Accelerated Gradient. NAG-GD achieves better sample quality than GD, with the gap being larger when using fewer steps.


Middle: Different Sampling Step Sizes. EqM is robust to a wide range of step sizes, whereas Flow Matching only functions properly at one specific step size.


Right: Total Steps Under Adaptive Compute. EqM assigns different numbers of steps for each sample, adaptively adjusting compute at inference time.


Unique Properties

Equilibrium Matching demonstrates unique properties that traditional diffusion/flow-based models lack. Equilibrium Matching can generate high-quality samples directly from partially noised inputs, whereas flow-based models only perform well when starting with pure noise. Moreover, Equilibrium Matching can perform out-of-distribution (OOD) detection without relying on any external module. We also show that different Equilibrium Matching models can be added together to generate compositional images in a similar way as EBMs. Our results show that Equilibrium Matching offers capabilities unseen in traditional diffusion/flow models.



Partially Noised Image Generation. EqM successfully generates realistic reconstruction while Flow Matching fails and remains constant noise.



Image Composition. We can compose different class-conditional EqM models by directly adding gradients together, in a similar way as EBMs.


BibTeX

@misc{Wang2025,
  title        = {Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models},
  author       = {Runqian Wang and Yilun Du},
  howpublished = {arXiv:2510.02300},
  year         = {2025},
  note         = {preprint},
  url          = {https://arxiv.org/abs/2510.02300},
  doi          = {10.48550/arXiv.2510.02300}
}