MHNet: Multi-scale Hierarchical Extraction Network for Small Object Detection in UAV Images
Keywords:
Object detection, Unmanned aerial vehicle images, Multiscale feature, Attention mechanismAbstract
Unmanned aerial vehicle (UAV) image have the characteristics of small object sizes, dense distributions, and complex backgrounds. Existing detection methods perform well under normal circumstances, but perform poorly when processing UAV images. In this paper, we propose MHNet, a small object detection framework for UAV images, which solves the above problems through multi-scale feature processing and more efficient feature fusion. First, we design a multi-scale hierarchical convolution (MHC) module that extracts features at different scales, layer by layer, providing finer-grained feature information and a larger receptive field. Second, we designed the SPPFC module to capture the multi-scale features extracted by the backbone. We introduce contextual anchor attention (CAA) in the SPPFC module to bolster contextual dependency and fortify feature information across various scales, thereby augmenting the semantic information of high-level features. At the same time, this paper uses an auxiliary detection head, combined with a new feature fusion architecture to improve the prediction ability of small objects. The CAA module downsample the input features of the auxiliary detection head to enhance the feature information of the other two detection heads. This design effectively promotes the fusion of high-level and low-level information. Multiple experiments on VisDrone2019 and UAVDT have demonstrated the effectiveness of MHNet. On VisDrone2019, with mAP and mAP50 reaching 28.2% and 45.8%, respectively. Compared with the benchmark, our MHNet improves mAP and mAP50 by 4.8% and 6.7%, respectively.
Downloads
References
X. Liu, C. Wang, and L. Liu, “Research on Pedestrian Detection Model and Compression Technology for UAV Images,” Sensors, vol. 22, no. 23, p. 9171, Nov. 2022, doi: 10.3390/s22239171.
I. Bisio, H. Haleem, C. Garibotto, F. Lavagetto, and A. Sciarrone, “Performance Evaluation and Analysis of Drone-Based Vehicle Detection Techniques From Deep Learning Perspective,” IEEE Internet of Things Journal, vol. 9, no. 13, pp. 10920–10935, Jul. 2022, doi: 10.1109/jiot.2021.3128065.
W. Sun, L. Dai, X. Zhang, P. Chang, and X. He, “RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring,” Applied Intelligence, vol. 52, no. 8, pp. 8448–8463, Oct. 2021, doi: 10.1007/s10489-021-02893-3.
S. H. Alsamhi et al., “UAV Computing-Assisted Search and Rescue Mission Framework for Disaster and Harsh Environment Mitigation,” Drones, vol. 6, no. 7, p. 154, Jun. 2022, doi: 10.3390/drones6070154.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, Jun. 2016, doi: 10.1109/cvpr.2016.91.
J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525, Jul. 2017, doi: 10.1109/cvpr.2017.690.
J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv: 1804.02767, 2018, doi: 10.48550/arXiv.1804.02767.
Z. Ge, S. Liu, F. Wang, Z. Li and J. Sun, "YOLOX: Exceeding YOLO series in 2021," arXiv preprint arXiv:2107.08430, 2021, doi: 10.48550/arXiv.2107.08430.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, et al., "Ssd: Single shot multibox detector", Computer Vision-ECCV 2016: 14th European Conference, pp. 21-37, October 11–14, 2016, doi: https://doi.org/10.1007/978-3-319-46448-0_2.
R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE InternationalConferenceonComputerVision (ICCV), 2015, pp. 1440–1448, doi: 10.1109/ICCV.2015.169.
S. Ren, K. He, R. Girshick, and J. Sun, “FasterR-CNN:Towards realtime object detection with region proposal networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), vol. 39, no. 6, pp. 1137–1149, 2017, doi: 10.1109/TPAMI.2016.2577031.
K. He, G. Gkioxari, P. Dollár, and R. Girshick, “MaskR-CNN,” arXiv preprint arXiv:1703.06870, 2017, doi: 10.48550/arXiv.1703.06870.
X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, “Grid R-CNN,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7355–7364, Jun. 2019, doi: 10.1109/cvpr.2019.00754.
J. Wan, B. Zhang, Y. Zhao, Y. Du, and Z. Tong, “VistrongerDet: Stronger Visual Information for Object Detection in VisDrone Images,” 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 2820–2829, Oct. 2021, doi: 10.1109/iccvw54120.2021.00316.
X. Li, W. Diao, Y. Mao, P. Gao, X. Mao, X. Li, and X. Sun, “OGMN: Occlusion-guided multi-task network for object detection in UAV images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 199, pp. 242–257, May 2023, doi: 10.1016/j.isprsjprs.2023.04.009.
X. Fu, G. Wei, X. Yuan, Y. Liang, and Y. Bo, “Efficient YOLOv7-Drone: An Enhanced Object Detection Approach for Drone Aerial Imagery,” Drones, vol. 7, no. 10, p. 616, Oct. 2023, doi: 10.3390/drones7100616.
Z. Song, L. Wang, G. Zhang, C. Jia, J. Bi, H. Wei, Y. Xia, C. Zhang, and L. Zhao, “Fast Detection of Multi-Direction Remote Sensing Ship Object Based on Scale Space Pyramid,” 2022 18th International Conference on Mobility, Sensing and Networking (MSN), pp. 1019–1024, Dec. 2022, doi: 10.1109/msn57253.2022.00165.
H. Wang, C. Liu, Y. Cai, L. Chen, and Y. Li, “YOLOv8-QSD: An Improved Small Object Detection Algorithm for Autonomous Vehicles Based on YOLOv8,” IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–16, 2024, doi: 10.1109/tim.2024.3379090.
P. Zhang, G. Zhang, and K. Yang, “APNet: Accurate Positioning Deformable Convolution for UAV Image Object Detection,” IEEE Latin America Transactions, vol. 22, no. 4, pp. 304–311, Apr. 2024, doi: 10.1109/tla.2024.10472961.
H. Lou, X. Duan, J. Guo, H. Liu, J. Gu, L. Bi, and H. Chen, “DC-YOLOv8: Small Size Object Detection Algorithm Based on Camera Sensor,” Apr. 2023, doi: 10.20944/preprints202304.0124.v1.
Z. Guo, C. Wang, G. Yang, Z. Huang, and G. Li, “MSFT-YOLO: Improved YOLOv5 Based on Transformer for Detecting Defects of Steel Surface,” Sensors, vol. 22, no. 9, p. 3467, May 2022, doi: 10.3390/s22093467.
Z. Song, Y. Zhang, Y. Liu, K. Yang, and M. Sun, “MSFYOLO: Feature fusion-based detection for small objects,” IEEE Latin America Transactions, vol. 20, no. 5, pp. 823–830, May 2022, doi: 10.1109/tla.2022.9693567.
C.-Y. Wang, H.-Y. Mark Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “CSPNet: A New Backbone that can Enhance Learning Capability of CNN,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun. 2020, doi: 10.1109/cvprw50498.2020.00203.
S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and P. Torr, “Res2Net: A New Multi-Scale Backbone Architecture,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 652–662, Feb. 2021, doi: 10.1109/tpami.2019.2938758.
G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLO,” Jan. 2023. [Online]. Available: https://github.com/ultralytics/ultralytics
K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, Sep. 2015, doi: 10.1109/tpami.2015.2389824.
X. Cai, Q. Lai, Y. Wang, W. Wang, Z. Sun, and Y. Yao, “Poly Kernel Inception Network for Remote Sensing Detection,” 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 27706–27716, Jun. 2024, doi: 10.1109/cvpr52733.2024.02617.
T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, doi: 10.1109/cvpr.2017.106.
H. Li, P. Xiong, J. An, and L. Wang, “Pyramid Attention Network for Semantic Segmentation,” arXiv preprint arXiv:1805.10180, 2018, doi: 10.48550/arXiv.1805.10180.
D. Du, P. Zhu, L. Wen, X. Bian, H. Lin, Q. Hu, T. Peng, J. Zheng, X. Wang, Y. Zhang, et al., “Visdrone-det2019: The vision meets drone object detection in image challenge results,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 213–226, doi: 10.1109/ICCVW.2019.00030.
D. Du, et al., “The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking,” Computer Vision – ECCV 2018, pp. 375–391, 2018, doi: 10.1007/978-3-030-01249-6_23.
G. Jocher, “YOLOv5 by ultralytics,” May2020. [Online]. Available: https://github.com/ultralytics/yolov5
W. Yu, T. Yang, and C. Chen, “Towards Resolving the Challenge of Long-tail Distribution in UAV Images for Object Detection,” 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3257–3266, Jan. 2021, doi: 10.1109/wacv48630.2021.00330.
Y. Zhao et al., “DETRs Beat YOLOs on Real-time Object Detection,” 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16965–16974, Jun. 2024, doi: 10.1109/cvpr52733.2024.01605.
Y. Yang, X. Gao, Y. Wang, and S. Song, “VAMYOLOX: An accurate and efficient object detection algorithm based on visual attention mechanism for uav optical sensors,” IEEE Sensors Journal, vol. 23, no. 11, pp. 11139–11155, 2023, doi: 10.1109/jsen.2022.3219199.
F. Yang, H. Fan, P. Chu, E. Blasch, and H. Ling, “Clustered Object Detection in Aerial Images,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8310–8319, Oct. 2019, doi: 10.1109/iccv.2019.00840.
C. Li, T. Yang, S. Zhu, C. Chen, and S. Guan, “Density Map Guided Object Detection in Aerial Images,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 737–746, Jun. 2020, doi: 10.1109/cvprw50498.2020.00103.
X. Yu, Y. Gong, N. Jiang, Q. Ye, and Z. Han, “Scale Match for Tiny Person Detection,” 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Mar. 2020, doi: 10.1109/wacv45572.2020.9093394.
Y. Zeng, T. Zhang, W. He, and Z. Zhang, “YOLOv7-UAV: An Unmanned Aerial Vehicle Image Object Detection Algorithm Based on Improved YOLOv7,” Electronics, vol. 12, no. 14, p. 3141, Jul. 2023, doi: 10.3390/electronics12143141.