# YOLO Object Detection
![[YOLO-detection.jpg]]
Region proposal algorithms like [[FastRCNN]] are two staged
- Stage one: Propose region
- Stage two: Determine features and classify objects
YOLO is a Single-shot detector i.e. it does the following in one go:
- Determines features with convolution
- Regression to minimize loss function
## History
2015 Yolo - Joseph Redmon et al - CVPR Open CV people choice award.
- First 'real' real-time object detection
2016 Yolo v2 - Redmon
- Anchor Boxes
- Darknet-19 CNN
2018 Yolo v3 - Redmon
- Prediction at 3 scales (small to large objects)
- Darknet-53 CNN
2020 Yolo v4 - Alexey Bochkovskiy et al
- Redmon stopped working because the work was being used for wrong purpose like Militarty purposes
- Adds many cool DL techniques
2020 Yolo v5 and other versions
- Now established as main stream real time object detection
## Objective
Yolo predicts the quantity
$y=\left(p_{c}, b_{x}, b_{y}, b_{h}, b_{w}, c\right)$
where,
$b_x, b_y, b_h$ - box coordinates
$c$ - box probability that there is an object in box
$p_c$ - conditional class probability for each class c
Then it minimizes the squared loss:
![[YOLO-objective.jpg]]
YOLO predictions:
![[YOLO-grids.jpg]]
## Anchor Boxes
Yolo v1: learned bounding boxes, initialized randomly, which led to not perfect localization
Yolo v2: learned bounding boxes: size/shape initialized by predefined anchor boxes.
How to compute anchor boxes?
[[Clustering]] box dimensions from COCO and PASCAL VOC dataset with [[K-Means]] with K=5. Both resulted in more or less the same dimensions of anchor boxes. This showed that apparently most datasets use the same dimensions of the anchor boxes.
## CNN architectures
Yolo v1 - [[The Inception Net]], 24 conv layers, 2 fc layers
Yolo v3 - Darknet-53, with 53 conv layers, residual connections
---
## References