# Image Segmentation Image segmentation breaks up images into semantically meaningful or perceptually similar regions. ![[image-segmentation.jpg]] What is the context? - Segmentation and recognition tasks are coupled - Which one to do first? ![[segmentation-ambiguity.jpg]] ## Segmentation as clustering Find similar regions with [[Clustering]] techniques. [[K-Means]] is widely used, for elongated variances, [[Gaussian Mixture Model]] is used. ## Fully Convolutional Encoder-Decoder Uses [[Convolutional Neural Networks (CNN)]] without dense layers in an encoder/decoder setup to predict segments as a supervised classification task (U-Net shaped model). Skip connections between layers in encoder and decoder can be used to transfer fine-grained details to decoder. ![[fullyconv-segmentation.jpg]] Dilated convolutions are also used. ## Depth (prediction) for segmentation Depth - Cue for the true scale of an object - Towards scale invariant features! Papers - 3D Neighborhood Convolution: Learning Depth-Aware Features for RGB-D and RGB Semantic Segmentation, Yunlu Chen, Thomas Mensink and Efstratios Gavves, In International Conference on 3D Vision (3DV) 2019 - Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection, Alex Bewley, Pei Sun, Thomas Mensink, Dragomir Anguelov, Cristian Sminchisescu, ArXiV 2020 --- ## References