# Soft Actor-Critic
Paper: https://arxiv.org/pdf/1812.05905.pdf
- Actor-critic method, similar to DDPG and TD3
- There is over-estimation bias in the critic in [[Deep Deterministic Policy Gradient (DDPG)]]
- TD3 - Have two critics, alway consider the minimums of two to prevent over-estimation
- Gives a justification for target actor: slow update of policy is necessary
- Soft means "entropy regularized"
- Pretends to be off-policy (need to check)
- Can use a replay buffer -> improve sample efficiency
- Adds entropy regularization to favor exploration
- Can also be used with deterministic actor (closer to [[TD3]])
- There are three successive versions
---
## References
1. Video by Olivier Sigaud https://www.youtube.com/watch?v=CLZkpo8rEGg