Multi-view Stereo - Notes on AI

# Multi-view Stereo Stereo is the process stimating shape and depth of an object from a 2D image. Depth and shape can be estimated from: - shading ![[stereo-shading.jpg]] - focus ![[stereo-focus.jpg]] - texture ![[stereo-texture 1.jpg]] - perpective effects (lines etc) ![[stereo-perspective.jpg]] - Also from motion, shadows and visual hull. Fun fact: Rabbits don't have 2D stereo. It has depth perception, but not from 2D stereo. Why multiple views? - Structrure and depth can be ambiguous from a single image. ## Human stereopsis Human eyes fixate on point in space - rotate so that corresponding images form in centers of fovea. _Disparity_ occurs when eyes fixate on one object; others appear at different visual angles. ## Multi-view geometry problems ![[stereo-multiview.jpg]] Approaches - Structure from Motion: Given projections of the same 3D point in two or more images, compute the 3D coordinates of that point. - Stereo correspondence: Given known camera parameters and a point in one of the images, where could its corresponding points be in the other images? - [[Lukas-Kanade Optical Flow|Optical Flow]]: Given two images, find the location of a world point in a second close-by image with no camera info. ## Geometry for a simple stereo system Stereo: shape from "motion" between two views We'll need to consider: - Info on camera pose ("calibration") - Image point correspondence Assume parallel optical axes, known camera parameters (i.e., calibrated cameras). What is expression for Z? ![[simple-stereo.jpg]] Similar triangles $\left(p_{1}, P, p_{r}\right)$ and $\left(\mathrm{O}_{1}, \mathrm{P}, \mathrm{O}_{r}\right)$ $ \begin{array}{c} \frac{T+x_{l}-x_{r}}{Z-_{c} f}=\frac{T}{Z} \\ Z=f \frac{T}{x_{r}-x_{l}} \end{array} $ The denominator term is disparity. ## Depth from disparity Goal: recover depth by finding image coordinate $x^{\prime}$ that corresponds to $x$ Sub-Problems 1. Calibration: How do we recover the relation of the cameras (if not already known)? 1. Intrinsic matrices for both cameras (e.g., $,$ f) 2. Baseline distance T in parallel camera case 3. $\mathrm{R}, \mathrm{t}$ in non-parallel case 2. Correspondence: How do we search for the matching point x'? 1. We need dense correspondence ![[depth-from-disparity.png]] $ w\left[\begin{array}{l} u \\ v \\ 1 \end{array}\right]=\left[\begin{array}{ccc} f_{x} & s & u_{0} \\ 0 & f_{y} & v_{0} \\ 0 & 0 & 1 \end{array}\right]\left[\begin{array}{llll} r_{11} & r_{12} & r_{13} & t_{x} \\ r_{21} & r_{22} & r_{23} & t_{y} \\ r_{31} & r_{32} & r_{33} & t_{z} \end{array}\right]\left[\begin{array}{l} x \\ y \\ z \\ 1 \end{array}\right] $ $ \mathbf{x}=\mathbf{K}[\mathbf{R} \quad \mathbf{t}] \mathbf{X} $ where, $\mathbf{x}$ : Image Coordinates: $(\mathrm{u}, \mathrm{v}, 1)$ K: Intrinsic Matrix (3x3) R: Rotation $(3 \times 3)$ t: Translation $(3 \times 1)$ X: World Coordinates: $(X, Y, Z, 1)$ ## Epipolar Geometry Constrains 2D search to 1D. Potential matches for x have to lie on the corresponding line l'. ![[epipolar.jpg]] Reduce search space for stereo disparity estimation. Help find $x$ ': If I know $x$, and have calibrated cameras (known intrinsics $\mathrm{K}, \mathrm{K}$ ' and extrinsic relationship), I can restrict $x$ ' to be along $l$ " --- ## References