# Image Segmentation
Image segmentation breaks up images into semantically meaningful or perceptually similar regions.
![[image-segmentation.jpg]]
What is the context?
- Segmentation and recognition tasks are coupled
- Which one to do first?
![[segmentation-ambiguity.jpg]]
## Segmentation as clustering
Find similar regions with [[Clustering]] techniques. [[K-Means]] is widely used, for elongated variances, [[Gaussian Mixture Model]] is used.
## Fully Convolutional Encoder-Decoder
Uses [[Convolutional Neural Networks (CNN)]] without dense layers in an encoder/decoder setup to predict segments as a supervised classification task (U-Net shaped model).
Skip connections between layers in encoder and decoder can be used to transfer fine-grained details to decoder.
![[fullyconv-segmentation.jpg]]
Dilated convolutions are also used.
## Depth (prediction) for segmentation
Depth
- Cue for the true scale of an object
- Towards scale invariant features!
Papers
- 3D Neighborhood Convolution: Learning Depth-Aware Features for RGB-D and RGB Semantic Segmentation, Yunlu Chen, Thomas Mensink and Efstratios Gavves, In International Conference on 3D Vision (3DV) 2019
- Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection, Alex Bewley, Pei Sun, Thomas Mensink, Dragomir Anguelov, Cristian Sminchisescu, ArXiV 2020
---
## References