컨텐츠 시작

학술대회/행사

초록검색

제출번호(No.) 0208
분류(Section) Special Session
분과(Session) (SS-17) Scientific Computing and Machine Learning (SS-17)
발표시간(Time) 20th-D-14:00 -- 14:30
영문제목
(Title(Eng.))
Maximum entropy inverse reinforcement learning of diffusion models and energy-based models
저자(Author(s))
Sangwoong Yoon1, Dohyun Kwon2, Himchan Hwang3, Yung-Kyun Noh4, Frank C. Park3
KIAS1, University of Seoul2, Seoul National University3, Hanyang University4
초록본문(Abstract) We present an inverse reinforcement learning (IRL) approach for training diffusion generative models, which results in joint training with an energy-based model (EBM). We consider training (or fine-tuning) a diffusion model through reinforcement learning with log data density as the reward. The IRL approach is natural in this context, as log data density is unknown and must be estimated from training data using a separate model, such as an EBM. In this paper, we introduce Generalized Contrastive Divergence (GCD), a minimax objective function that is equivalent to maximum entropy (MaxEnt) IRL of a diffusion model and an EBM. The MaxEnt regularization in GCD ensures that both models converge to the data distribution. The key subroutine in GCD learning is the MaxEnt reinforcement learning of a diffusion model. To perform diffusion model updates more effectively, we propose the Diffusion Soft Actor-Critic (DSAC), a novel MaxEnt RL algorithm for diffusion models. Our empirical study finds that DSAC converges significantly faster than policy gradient methods while achieving better performance. Diffusion models fine-tuned with GCD and DSAC can generate high-quality samples with a small number of steps. Additionally, GCD facilitates the training of an EBM without MCMC, stabilizing EBM training dynamics and improving outlier detection performance.
분류기호
(MSC number(s))
68T07
키워드(Keyword(s)) Diffusion models, energy-based models, generative models, reinforcement learning, inverse reinforcement learning
강연 형태
(Language of Session (Talk))
Korean