# Soft Actor-Critic Paper: https://arxiv.org/pdf/1812.05905.pdf - Actor-critic method, similar to DDPG and TD3 - There is over-estimation bias in the critic in [[Deep Deterministic Policy Gradient (DDPG)]] - TD3 - Have two critics, alway consider the minimums of two to prevent over-estimation - Gives a justification for target actor: slow update of policy is necessary - Soft means "entropy regularized" - Pretends to be off-policy (need to check) - Can use a replay buffer -> improve sample efficiency - Adds entropy regularization to favor exploration - Can also be used with deterministic actor (closer to [[TD3]]) - There are three successive versions --- ## References 1. Video by Olivier Sigaud https://www.youtube.com/watch?v=CLZkpo8rEGg