# Rotary Position Embeddings (RoPE)
Instead of adding [[Positional Encoding]] vectors to input for encoding token position, RoPE encodes position by rotating Q and K vectors. Each position $i$ has a rotation matrix $R_i$:
$Q_i = (X_i W_Q) R_i$ $K_j = (X_j W_K) R_j$
When computing attention between positions $i$ and $j$:
$Q_i K_j^T = (X_i W_Q) R_i R_j^T (X_j W_K)^T$
The key property: $R_i R_j^T$ depends only on the relative position $(i - j)$, not absolute positions. This comes from trig identities—terms like $\cos(i\theta)\cos(j\theta) + \sin(i\theta)\sin(j\theta)$ simplify to $\cos((i-j)\theta)$.
**What's the rotation?**
RoPE rotates pairs of dimensions. For a 2D case at position $i$:
$R_i = \begin{pmatrix} \cos(i\theta) & -\sin(i\theta) \ \sin(i\theta) & \cos(i\theta) \end{pmatrix}$
For higher dimensions, this is applied to pairs of dimensions with different frequencies $\theta$, allowing the model to capture both fine-grained (high frequency) and long-range (low frequency) positional information.
RoPE applies an absolute rotation per position, but the dot product captures relative distance—unifying absolute and relative approaches without explicitly computing or storing relative biases.