The network transform the object detection problem into the classification problem and greatly improv the accuracy.

It generate partially overlapping candidate areas from each detection target.


It introduces the spatial pyramid pooling layer after the last convolution layer, thus repetitive processing is eliminated.

Training is a multi-stage process with long training time.

Fast R-CNN

Its raining and testing are significantly faster than SPP-net. The input image can be any size.

The network still depend on candidate region selection algorithm.

Faster R-CNN

This network is faster than Fast R-CNN and no longer depend on region selection algorithm

The training process is complex, and there is still much room for optimization in the calculation process.


The multi-scale feature map is adopted and the processing speed is fast.

The robustness of this network to small object detection is not high.


The network can meet the real-time requirements with using the full image as Context information.

It is relatively sensitive to the scale of the object, and the effect of small target detection is not good.