REVIEW OF NEURAL NETWORK MODELS FOR 2D OBJECT DETECTION
DOI:
https://doi.org/10.18372/2310-5461.68.20280Keywords:
object detection, neural network, convolutional neural network, machine learning, deep learning, datasetAbstract
This review article presents a catalogue of deep learning neural network models for detecting 2D objects in images. These models are divided into three groups: single-stage, two-stage, and transformer-based models. Traditional methods developed prior to the application of deep learning in classification and detection tasks are also mentioned. The following computer vision tasks are considered: image classification, image localization, object detection, and image segmentation. Some fundamental components of 2D object detection are considered, such as commonly used data sets (PASCAL-VOC, ILSVRC, MS-COCO, OpenImages, Objects365), and a list of quality indicators with a brief description. The last ones are split into quality metrics and performance metrics. The list of quality metrics includes Precision, Recall, Average Precision (AP), Mean Average Precision (mAP), and Average Recall (AR). The list of performance metrics includes Inference Time, FPS (Frames Per Second), Latency, and Throughput. The mAP values for different models and datasets are presented. Also examples of mAP dependence on speed and number of parameters for different versions of the YOLO model family are presented. A method for conducting a comparative analysis of different detection models is presented, with an indication of suitable software tools. A brief description of the idea is provided for each group of models. The article contains a brief historical overview of the triumph of the AlexNet model in 2012 at the ILSVRC (ImageNet Large Scale Visual Recognition Challenge) competition. This overview is undoubtedly of interest to those who are unfamiliar with the topic of detecting 2D objects using neural networks and machine learning methods, but who want to quickly get up to speed.
References
Malagoli E. and Di Persio L. 2D Object Detection: A Survey. Mathematics. 2025. Mar. Vol. 13, no. 6. P. 893. ISSN 2227-7390. DOI: 10.3390/math13060893.
Tsirtsakis P., Zacharis G., Maraslidis G. S., and Fragulis G. F. Deep learning for object recognition: A comprehensive review of models and algorithms. International Journal of Cognitive Computing in Engineering. 2025. Dec. Vol. 6. P. 298–312. ISSN 2666-3074. DOI: 10.1016/j.ijcce.2025.01.004.
Zou Z., Chen K., Shi Z., Guo Y., and Ye J. Object Detection in 20 Years: A Survey. Proceedings of the IEEE. 2023. Mar. Vol. 111, no. 3. P. 257–276. ISSN 1558-2256. DOI: 10.1109/jproc.2023.3238524.
Padilla R., Netto S. L., and da Silva E. A. B. A Survey on Performance Metrics for Object-Detection Algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE. 2020. July. P. 237–242. DOI: 10.1109/iwssip48289.2020.9145130.
Girshick R. Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV). IEEE. 2015. Dec. P. 1440–1448. DOI: 10.1109/iccv.2015.169.
Ren S., He K., Girshick R., and Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017. June. Vol. 39, no. 6. P. 1137–1149. ISSN 2160-9292. DOI: 10.1109/tpami.2016.2577031.
Dai J., Li Y., He K., and Sun J. R-FCN: Object detection via region-based fully convolutional networks. 2016. P. 379–387. DOI: 10.48550/arXiv.1605.06409.
Lin T.-Y., Dollar P., Girshick R., He K., Hariharan B., and Belongie S. Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2017. July. DOI: 10.1109/cvpr.2017.106.
He K., Gkioxari G., Dollar P., and Girshick R. Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. 2017. Oct. DOI: 10.1109/iccv.2017.322.
Cai Z. and Vasconcelos N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021. May. Vol. 43, no. 5. P. 1483–1498. ISSN 1939-3539. DOI: 10.1109/tpami.2019.2956516.
Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-Y., and Berg A. C. SSD: Single ShotMultiBox Detector. Computer Vision – ECCV 2016. Springer International Publishing, 2016. P. 21–37. (ISSN 1611-3349). ISBN 9783319464480. DOI: 10.1007/978-3-319-46448-0_2.
Fu C.-Y., Liu W., Ranga A., Tyagi A., and Berg A. C. DSSD : Deconvolutional Single Shot Detector. 2017. Jan. DOI: 10.48550/ARXIV.1701.06659.
Li Z., Yang L., and Zhou F. FSSD: Feature Fusion Single Shot Multibox Detector. 2017. Dec. DOI: 10.48550/ARXIV.1712.00960.
Lin T.-Y., Goyal P., Girshick R., He K., and Dollar P. Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. 2017. Oct. DOI: 10.1109/iccv.2017.324.
Redmon J. and Farhadi A. YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2017. July. P. 6517–6525. DOI: 10.1109/cvpr.2017.690.
Zhang S., Wen L., Bian X., Lei Z., and Li S. Z. Single-Shot Refinement Neural Network for Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. 2018. June. DOI: 10.1109/cvpr.2018.00442.
Law H. and Deng J. CornerNet: Detecting Objects as Paired Keypoints. Computer Vision – ECCV 2018. Springer International Publishing, 2018. P. 765–781. (ISSN 1611-3349). ISBN 9783030012649. DOI: 10.1007/978-3-030-01264-9_45.
Redmon J. and Farhadi A. YOLOv3: An Incremental Improvement. 2018. Apr. DOI: 10.48550/ARXIV.1804.02767.
Nie J., Anwer R. M., Cholakkal H., Khan F. S., Pang Y., and Shao L. Enriched Feature Guided Refinement Network for Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. 2019. Oct. DOI: 10.1109/iccv.2019.00963.
Yi J., Wu P., and Metaxas D. N. ASSD: Attentive single shot multibox detector. Computer Vision and Image Understanding. 2019. Dec. Vol. 189. P. 102827. ISSN 1077-3142. DOI: 10.1016/j.cviu.2019.102827.
Duan K., Bai S., Xie L., Qi H., Huang Q., and Tian Q. CenterNet: Keypoint Triplets for Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. 2019. Oct. DOI: 10.1109/iccv.2019.00667.
Zhou X., Zhuo J., and Krahenbuhl P. Bottom-Up Object Detection by Grouping Extreme and Center Points. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2019. June. DOI: 10.1109/cvpr.2019.00094.
Tian Z., Shen C., Chen H., and He T. FCOS: Fully Convolutional One-Stage Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. 2019. Oct. DOI: 10.1109/iccv.2019.00972.
Zhu C., He Y., and Savvides M. Feature Selective Anchor-Free Module for Single-Shot Object Detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2019. June. DOI: 10.1109/cvpr.2019.00093.
Kong T., Sun F., Liu H., Jiang Y., Li L., and Shi J. FoveaBox: Beyound Anchor-Based Object Detection. IEEE Transactions on Image Processing. 2020. Vol. 29. P. 7389–7398. ISSN 1941-0042. DOI: 10.1109/tip.2020.3002345.
Bochkovskiy A., Wang C.-Y., and Liao H.-Y. M. YOLOv4: Optimal Speed and Accuracy of Object Detection. 2020. Apr. DOI: 10.48550/ARXIV.2004.10934.
Wang C.-Y., Bochkovskiy A., and Liao H.-Y. M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2023. June. P. 7464–7475. DOI: 10.1109/cvpr52729.2023.00721.
Zhu X., Su W., Lu L., Li B., Wang X., and Dai J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. 2020. Oct. DOI: 10.48550/ARXIV.2010.04159.
Zhang H., Li F., Liu S., Zhang L., Su H., Zhu J., Ni L. M., and Shum H.-Y. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. 2022. Mar. DOI: 10.48550/ARXIV.2203.03605.
Wang A., Chen H., Liu L., Chen K., Lin Z., Han J., and Ding G. YOLOv10: Real-Time End-to-End Object Detection. 2024. Vol. 37.
Viola P. and Jones M. Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. IEEE Comput. Soc. 2001. P. I–511–I–518. (CVPR-01 ; Vol. 1). DOI: 10.1109/cvpr.2001.990517.
Dalal N. and Triggs B. Histograms of Oriented Gradients for Human Detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE. 2005. Vol. 1. P. 886–893. DOI: 10.1109/cvpr.2005.177.
Felzenszwalb P., McAllester D., and Ramanan D. A discriminatively trained, multiscale, deformable part model. 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2008. June. P. 1–8. DOI: 10.1109/cvpr.2008.4587597.
Krizhevsky A., Sutskever I., and Hinton G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 2017. May. Vol. 60, no. 6. P. 84–90. ISSN 1557-7317. DOI: 10.1145/3065386.
Girshick R., Donahue J., Darrell T., and Malik J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2014. June. P. 580–587. DOI: 10.1109/cvpr.2014.81.
He K., Zhang X., Ren S., and Sun J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2015. Sep. Vol. 37, no. 9. P. 1904–1916. ISSN 2160-9292. DOI: 10.1109/tpami.2015.2389824.
Sermanet P., Eigen D., Zhang X., Mathieu M., Fergus R., and LeCun Y. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. 2013. Dec. DOI: 10.485 50/ARXIV.1312.6229.
Li C., Li L., Jiang H., Weng K., Geng Y., Li L., Ke Z., Li Q., Cheng M., Nie W., Li Y., Zhang B.,Liang Y., Zhou L., Xu X., Chu X., Wei X., and Wei X. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. 2022. Sep. DOI: 10.48550/ARXIV.2209.02976.
Redmon J., Divvala S., Girshick R., and Farhadi A. You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2016. June. P. 779–788. DOI: 10.1109/cvpr.2016.91.
Jocher G. YOLOv5 release v7.0. https://github.com/ultralytics/yolov5/tree/v7.0. 2022. URL:https://github.com/ultralytics/yolov5/tree/v7.0, (Access date: 2025-08-10).
Jocher G. Ultralytics YOLO. 2025. URL: https://github.com/ultralytics/ultralytics/tree/main, (Access date: 2025-08-10).
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., and Polosukhin I. Attention is all you need. 2017. Vol. 2017-December. P. 5999 – 6009. DOI: 10.48550/arXiv.1706.03762.
Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., and Zagoruyko S. End-to-End Object Detection with Transformers. Computer Vision – ECCV 2020. Springer International Publishing, 2020. P. 213–229. (ISSN 1611-3349). ISBN 9783030584528. DOI: 10.1007/978-3-030-58452-8_13.
Beal J., Kim E., Tzeng E., Park D. H., Zhai A., and Kislyuk D. Toward Transformer-Based Object Detection. 2020. Dec. DOI: 10.48550/ARXIV.2012.09958.
Downloads
Published
How to Cite
Issue
Section
License
The scientific journal adheres to the principles of Open Access and provides free, immediate, and permanent access to all published materials without financial, technical, or legal barriers for readers.
All articles are published in Open Access under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Copyright
Authors who publish their works in the journal:
-
retain the copyright to their publications;
-
grant the journal the right of first publication of the article;
-
agree to the distribution of their materials under the CC BY 4.0 license;
-
have the right to reuse, archive, and distribute their works (including in institutional and subject repositories), provided that proper reference is made to the original publication in the journal.




