description [ICLR 2026][Image Generation][Diffusion model fine-tuning] This paper proposes SQDF (Soft Q-based Diffusion Finetuning), which fine-tunes diffusion models under a KL-regularized RL ...
This material describe the Q-function approximator for DQN/DDQN and the policy-value network used in PPO. Key components include experience replay and target network updates for DQN-based methods, and ...