ОГЛЯД МОДЕЛЕЙ НЕЙРОННИХ МЕРЕЖ ДЛЯ ЗАДАЧ ВИЯВЛЕННЯ 2D-ОБ’ЄКТIВ

Сергій Бєлов; Максим Заліський

doi:10.18372/2310-5461.68.20280

Автор(и)

Сергій Бєлов Державний університет «Київський авіаційний інститут», Київ, Україна
Максим Заліський Державний університет «Київський авіаційний інститут», Київ, Україна

DOI:

https://doi.org/10.18372/2310-5461.68.20280

Ключові слова:

виявлення об'єктів, нейронна мережа, згорткова нейронна мережа, машинне навчання, глибоке навчання, набір данних

Анотація

У даному оглядовій статті представлено каталог моделей нейронних мереж глибокого навчання для задач виявлення 2D-об’єктiв на зображеннях. Ці моделі розділені на три групи: одноетапнi, двоетапнi та моделі, що базуються на трансформерах. Також згадані традиційні методи, розроблені до застосування глибокого навчання в задачах класифікації та виявлення. Розглядається постановка наступних задач комп’ютерного зору: класифікація зображення, локалізація зображення, виявлення об’єкта i сегментація зображення. Розглянуто також деякі фундаментальні компоненти виявлення 2D-об’єктiв, такі як загальноприйняті та широко використовувані набори даних (PASCAL-VOC, ILSVRC, MS-COCO, OpenImages, Objects365) та наведено перелік показників якості з їх коротким описом. Останні розділені на метрики якості та метрики швидкодії. У метриках якості наведено Precision, Recall, Average Precision (AP), mean Average Precision (mAP), Average Recall (AR). Перелік метрик швидкодії включає Inference Time (час виведення), FPS (Frame Per Seconds), Latency (затримка) i Throughput (пропускна здатність). Наведено mAP для різних моделей i наборів даних, а також приклади залежності mAP від швидкодії та кількості параметрів для різних версій лінійки моделей YOLO. Висловлено спосіб проведення порівняльного аналізу різних моделей виявлення із зазначенням відповідних програмних інструментів. Для кожної групи моделей надано стислий опис особливостей її функціонування. Стаття також містить короткий історичний нарис застосування моделі AlexNet у 2012 році на змаганні ILSVRC (ImageNet Large Scale Visual Recognition Challenge). Даний огляд безсумнівно представляє інтерес для здобувачів та науковців, які не знайомі з тематикою виявлення 2D-об’єктiв за допомогою нейронних мереж i методів машинного навчання, але при цьому бажають швидко отримати ознайомлювальну інформацію.

Біографії авторів

Сергій Бєлов, Державний університет «Київський авіаційний інститут», Київ, Україна

Аспірант кафедри телекомунікаційних та радіоелектронних систем

Максим Заліський, Державний університет «Київський авіаційний інститут», Київ, Україна

Доктор технічних наук, професор

Посилання

Malagoli E. and Di Persio L. 2D Object Detection: A Survey. Mathematics. 2025. Mar. Vol. 13, no. 6. P. 893. ISSN 2227-7390. DOI: 10.3390/math13060893.

Tsirtsakis P., Zacharis G., Maraslidis G. S., and Fragulis G. F. Deep learning for object recognition: A comprehensive review of models and algorithms. International Journal of Cognitive Computing in Engineering. 2025. Dec. Vol. 6. P. 298–312. ISSN 2666-3074. DOI: 10.1016/j.ijcce.2025.01.004.

Zou Z., Chen K., Shi Z., Guo Y., and Ye J. Object Detection in 20 Years: A Survey. Proceedings of the IEEE. 2023. Mar. Vol. 111, no. 3. P. 257–276. ISSN 1558-2256. DOI: 10.1109/jproc.2023.3238524.

Padilla R., Netto S. L., and da Silva E. A. B. A Survey on Performance Metrics for Object-Detection Algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE. 2020. July. P. 237–242. DOI: 10.1109/iwssip48289.2020.9145130.

Girshick R. Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV). IEEE. 2015. Dec. P. 1440–1448. DOI: 10.1109/iccv.2015.169.

Ren S., He K., Girshick R., and Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017. June. Vol. 39, no. 6. P. 1137–1149. ISSN 2160-9292. DOI: 10.1109/tpami.2016.2577031.

Dai J., Li Y., He K., and Sun J. R-FCN: Object detection via region-based fully convolutional networks. 2016. P. 379–387. DOI: 10.48550/arXiv.1605.06409.

Lin T.-Y., Dollar P., Girshick R., He K., Hariharan B., and Belongie S. Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2017. July. DOI: 10.1109/cvpr.2017.106.

He K., Gkioxari G., Dollar P., and Girshick R. Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. 2017. Oct. DOI: 10.1109/iccv.2017.322.

Cai Z. and Vasconcelos N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021. May. Vol. 43, no. 5. P. 1483–1498. ISSN 1939-3539. DOI: 10.1109/tpami.2019.2956516.

Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-Y., and Berg A. C. SSD: Single ShotMultiBox Detector. Computer Vision – ECCV 2016. Springer International Publishing, 2016. P. 21–37. (ISSN 1611-3349). ISBN 9783319464480. DOI: 10.1007/978-3-319-46448-0_2.

Fu C.-Y., Liu W., Ranga A., Tyagi A., and Berg A. C. DSSD : Deconvolutional Single Shot Detector. 2017. Jan. DOI: 10.48550/ARXIV.1701.06659.

Li Z., Yang L., and Zhou F. FSSD: Feature Fusion Single Shot Multibox Detector. 2017. Dec. DOI: 10.48550/ARXIV.1712.00960.

Lin T.-Y., Goyal P., Girshick R., He K., and Dollar P. Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. 2017. Oct. DOI: 10.1109/iccv.2017.324.

Redmon J. and Farhadi A. YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2017. July. P. 6517–6525. DOI: 10.1109/cvpr.2017.690.

Zhang S., Wen L., Bian X., Lei Z., and Li S. Z. Single-Shot Refinement Neural Network for Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. 2018. June. DOI: 10.1109/cvpr.2018.00442.

Law H. and Deng J. CornerNet: Detecting Objects as Paired Keypoints. Computer Vision – ECCV 2018. Springer International Publishing, 2018. P. 765–781. (ISSN 1611-3349). ISBN 9783030012649. DOI: 10.1007/978-3-030-01264-9_45.

Redmon J. and Farhadi A. YOLOv3: An Incremental Improvement. 2018. Apr. DOI: 10.48550/ARXIV.1804.02767.

Nie J., Anwer R. M., Cholakkal H., Khan F. S., Pang Y., and Shao L. Enriched Feature Guided Refinement Network for Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. 2019. Oct. DOI: 10.1109/iccv.2019.00963.

Yi J., Wu P., and Metaxas D. N. ASSD: Attentive single shot multibox detector. Computer Vision and Image Understanding. 2019. Dec. Vol. 189. P. 102827. ISSN 1077-3142. DOI: 10.1016/j.cviu.2019.102827.

Duan K., Bai S., Xie L., Qi H., Huang Q., and Tian Q. CenterNet: Keypoint Triplets for Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. 2019. Oct. DOI: 10.1109/iccv.2019.00667.

Zhou X., Zhuo J., and Krahenbuhl P. Bottom-Up Object Detection by Grouping Extreme and Center Points. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2019. June. DOI: 10.1109/cvpr.2019.00094.

Tian Z., Shen C., Chen H., and He T. FCOS: Fully Convolutional One-Stage Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. 2019. Oct. DOI: 10.1109/iccv.2019.00972.

Zhu C., He Y., and Savvides M. Feature Selective Anchor-Free Module for Single-Shot Object Detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2019. June. DOI: 10.1109/cvpr.2019.00093.

Kong T., Sun F., Liu H., Jiang Y., Li L., and Shi J. FoveaBox: Beyound Anchor-Based Object Detection. IEEE Transactions on Image Processing. 2020. Vol. 29. P. 7389–7398. ISSN 1941-0042. DOI: 10.1109/tip.2020.3002345.

Bochkovskiy A., Wang C.-Y., and Liao H.-Y. M. YOLOv4: Optimal Speed and Accuracy of Object Detection. 2020. Apr. DOI: 10.48550/ARXIV.2004.10934.

Wang C.-Y., Bochkovskiy A., and Liao H.-Y. M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2023. June. P. 7464–7475. DOI: 10.1109/cvpr52729.2023.00721.

Zhu X., Su W., Lu L., Li B., Wang X., and Dai J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. 2020. Oct. DOI: 10.48550/ARXIV.2010.04159.

Zhang H., Li F., Liu S., Zhang L., Su H., Zhu J., Ni L. M., and Shum H.-Y. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. 2022. Mar. DOI: 10.48550/ARXIV.2203.03605.

Wang A., Chen H., Liu L., Chen K., Lin Z., Han J., and Ding G. YOLOv10: Real-Time End-to-End Object Detection. 2024. Vol. 37.

Viola P. and Jones M. Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. IEEE Comput. Soc. 2001. P. I–511–I–518. (CVPR-01 ; Vol. 1). DOI: 10.1109/cvpr.2001.990517.

Dalal N. and Triggs B. Histograms of Oriented Gradients for Human Detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE. 2005. Vol. 1. P. 886–893. DOI: 10.1109/cvpr.2005.177.

Felzenszwalb P., McAllester D., and Ramanan D. A discriminatively trained, multiscale, deformable part model. 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2008. June. P. 1–8. DOI: 10.1109/cvpr.2008.4587597.

Krizhevsky A., Sutskever I., and Hinton G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 2017. May. Vol. 60, no. 6. P. 84–90. ISSN 1557-7317. DOI: 10.1145/3065386.

Girshick R., Donahue J., Darrell T., and Malik J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2014. June. P. 580–587. DOI: 10.1109/cvpr.2014.81.

He K., Zhang X., Ren S., and Sun J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2015. Sep. Vol. 37, no. 9. P. 1904–1916. ISSN 2160-9292. DOI: 10.1109/tpami.2015.2389824.

Sermanet P., Eigen D., Zhang X., Mathieu M., Fergus R., and LeCun Y. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. 2013. Dec. DOI: 10.485 50/ARXIV.1312.6229.

Li C., Li L., Jiang H., Weng K., Geng Y., Li L., Ke Z., Li Q., Cheng M., Nie W., Li Y., Zhang B.,Liang Y., Zhou L., Xu X., Chu X., Wei X., and Wei X. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. 2022. Sep. DOI: 10.48550/ARXIV.2209.02976.

Redmon J., Divvala S., Girshick R., and Farhadi A. You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2016. June. P. 779–788. DOI: 10.1109/cvpr.2016.91.

Jocher G. YOLOv5 release v7.0. https://github.com/ultralytics/yolov5/tree/v7.0. 2022. URL:https://github.com/ultralytics/yolov5/tree/v7.0, (Access date: 2025-08-10).

Jocher G. Ultralytics YOLO. 2025. URL: https://github.com/ultralytics/ultralytics/tree/main, (Access date: 2025-08-10).

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., and Polosukhin I. Attention is all you need. 2017. Vol. 2017-December. P. 5999 – 6009. DOI: 10.48550/arXiv.1706.03762.

Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., and Zagoruyko S. End-to-End Object Detection with Transformers. Computer Vision – ECCV 2020. Springer International Publishing, 2020. P. 213–229. (ISSN 1611-3349). ISBN 9783030584528. DOI: 10.1007/978-3-030-58452-8_13.

Beal J., Kim E., Tzeng E., Park D. H., Zhai A., and Kislyuk D. Toward Transformer-Based Object Detection. 2020. Dec. DOI: 10.48550/ARXIV.2012.09958.

ОГЛЯД МОДЕЛЕЙ НЕЙРОННИХ МЕРЕЖ ДЛЯ ЗАДАЧ ВИЯВЛЕННЯ 2D-ОБ’ЄКТIВ

Автор(и)

DOI:

Ключові слова:

Анотація

Біографії авторів

Сергій Бєлов, Державний університет «Київський авіаційний інститут», Київ, Україна

Максим Заліський, Державний університет «Київський авіаційний інститут», Київ, Україна

Посилання

##submission.downloads##

Опубліковано

Як цитувати

Номер

Розділ

Ліцензія

##plugins.block.developedBy.blockTitle##

Мова

Інформація

Подати статтю

Logo