# Multi-view Stereo
Stereo is the process stimating shape and depth of an object from a 2D image.
Depth and shape can be estimated from:
- shading
![[stereo-shading.jpg]]
- focus
![[stereo-focus.jpg]]
- texture
![[stereo-texture 1.jpg]]
- perpective effects (lines etc)
![[stereo-perspective.jpg]]
- Also from motion, shadows and visual hull.
Fun fact: Rabbits don't have 2D stereo. It has depth perception, but not from 2D stereo.
Why multiple views?
- Structrure and depth can be ambiguous from a single image.
## Human stereopsis
Human eyes fixate on point in space - rotate so that corresponding images form in centers of fovea.
_Disparity_ occurs when eyes fixate on one object; others appear at different visual angles.
## Multi-view geometry problems
![[stereo-multiview.jpg]]
Approaches
- Structure from Motion: Given projections of the same 3D point in two or more images, compute the 3D coordinates of that point.
- Stereo correspondence: Given known camera parameters and a point in one of the images, where could its corresponding points be in the other images?
- [[Lukas-Kanade Optical Flow|Optical Flow]]: Given two images, find the location of a world point in a second close-by image with no camera info.
## Geometry for a simple stereo system
Stereo: shape from "motion" between two views
We'll need to consider:
- Info on camera pose ("calibration")
- Image point correspondence
Assume parallel optical axes, known camera parameters (i.e., calibrated cameras). What is expression for Z?
![[simple-stereo.jpg]]
Similar triangles $\left(p_{1}, P, p_{r}\right)$ and $\left(\mathrm{O}_{1}, \mathrm{P}, \mathrm{O}_{r}\right)$
$
\begin{array}{c}
\frac{T+x_{l}-x_{r}}{Z-_{c} f}=\frac{T}{Z} \\
Z=f \frac{T}{x_{r}-x_{l}}
\end{array}
$
The denominator term is disparity.
## Depth from disparity
Goal: recover depth by finding image coordinate $x^{\prime}$ that corresponds to $x$ Sub-Problems
1. Calibration: How do we recover the relation of the cameras (if not already known)?
1. Intrinsic matrices for both cameras (e.g., $,$ f)
2. Baseline distance T in parallel camera case
3. $\mathrm{R}, \mathrm{t}$ in non-parallel case
2. Correspondence: How do we search for the matching point x'?
1. We need dense correspondence
![[depth-from-disparity.png]]
$
w\left[\begin{array}{l}
u \\
v \\
1
\end{array}\right]=\left[\begin{array}{ccc}
f_{x} & s & u_{0} \\
0 & f_{y} & v_{0} \\
0 & 0 & 1
\end{array}\right]\left[\begin{array}{llll}
r_{11} & r_{12} & r_{13} & t_{x} \\
r_{21} & r_{22} & r_{23} & t_{y} \\
r_{31} & r_{32} & r_{33} & t_{z}
\end{array}\right]\left[\begin{array}{l}
x \\
y \\
z \\
1
\end{array}\right]
$
$
\mathbf{x}=\mathbf{K}[\mathbf{R} \quad \mathbf{t}] \mathbf{X}
$
where,
$\mathbf{x}$ : Image Coordinates: $(\mathrm{u}, \mathrm{v}, 1)$
K: Intrinsic Matrix (3x3)
R: Rotation $(3 \times 3)$
t: Translation $(3 \times 1)$
X: World Coordinates: $(X, Y, Z, 1)$
## Epipolar Geometry
Constrains 2D search to 1D. Potential matches for x have to lie on the corresponding line l'.
![[epipolar.jpg]]
Reduce search space for stereo disparity estimation. Help find $x$ ': If I know $x$, and have calibrated cameras (known intrinsics $\mathrm{K}, \mathrm{K}$ ' and extrinsic relationship), I can restrict $x$ ' to be along $l$ "
---
## References