# Incremental Implementation of Estimating Action Values
When calculating averages, the naive approach would be maintain a record of all rewards and perform compuation when estimated value is needed. This doesn't scale with the number of rewards seen. It is simple to devise incremental formulas instead.
Given $Q_{n}$ and the $n$th reward, $R_{n}$, the new average of all $n$ rewards can be computed by
$
\begin{aligned}
Q_{n+1} &=\frac{1}{n} \sum_{i=1}^{n} R_{i} \\
&=\frac{1}{n}\left(R_{n}+\sum_{i=1}^{n-1} R_{i}\right) \\
&=\frac{1}{n}\left(R_{n}+(n-1) \frac{1}{n-1} \sum_{i=1}^{n-1} R_{i}\right) \\
&=\frac{1}{n}\left(R_{n}+(n-1) Q_{n}\right) \\
&=\frac{1}{n}\left(R_{n}+n Q_{n}-Q_{n}\right) \\
&=Q_{n}+\frac{1}{n}\left[R_{n}-Q_{n}\right]
\end{aligned}
$
This update is of a form that occurs frequently in [[Reinforcement Learning]] problems. The general form is
$
\text { NewEstimate } \leftarrow \text { OldEstimate }+\text { StepSize }[\text { Target }-\text { OldEstimate }]
$
The expression $[\text { Target }-\text { OldEstimate }]$ is an *error* in the estimate.
---
## References
1. Chapter 2.4, RL:AI, Sutton and Barto 2nd Edition