# YOLO Object Detection ![[YOLO-detection.jpg]] Region proposal algorithms like [[FastRCNN]] are two staged - Stage one: Propose region - Stage two: Determine features and classify objects YOLO is a Single-shot detector i.e. it does the following in one go: - Determines features with convolution - Regression to minimize loss function ## History 2015 Yolo - Joseph Redmon et al - CVPR Open CV people choice award. - First 'real' real-time object detection 2016 Yolo v2 - Redmon - Anchor Boxes - Darknet-19 CNN 2018 Yolo v3 - Redmon - Prediction at 3 scales (small to large objects) - Darknet-53 CNN 2020 Yolo v4 - Alexey Bochkovskiy et al - Redmon stopped working because the work was being used for wrong purpose like Militarty purposes - Adds many cool DL techniques 2020 Yolo v5 and other versions - Now established as main stream real time object detection ## Objective Yolo predicts the quantity $y=\left(p_{c}, b_{x}, b_{y}, b_{h}, b_{w}, c\right)$ where, $b_x, b_y, b_h$ - box coordinates $c$ - box probability that there is an object in box $p_c$ - conditional class probability for each class c Then it minimizes the squared loss: ![[YOLO-objective.jpg]] YOLO predictions: ![[YOLO-grids.jpg]] ## Anchor Boxes Yolo v1: learned bounding boxes, initialized randomly, which led to not perfect localization Yolo v2: learned bounding boxes: size/shape initialized by predefined anchor boxes. How to compute anchor boxes? [[Clustering]] box dimensions from COCO and PASCAL VOC dataset with [[K-Means]] with K=5. Both resulted in more or less the same dimensions of anchor boxes. This showed that apparently most datasets use the same dimensions of the anchor boxes. ## CNN architectures Yolo v1 - [[The Inception Net]], 24 conv layers, 2 fc layers Yolo v3 - Darknet-53, with 53 conv layers, residual connections --- ## References