MHNet: Multi-scale Hierarchical Extraction Network for Small Object Detection in UAV Images

Authors

  • Ziyang Xing Hebei University of Science and Technology
  • Xuebin Xu Hebei University of Science and Technology
  • Meiling Sun Shijiazhuang Preschool Education College
  • Kuihe Yang Hebei University of Science and Technology

Keywords:

Object detection, Unmanned aerial vehicle images, Multiscale feature, Attention mechanism

Abstract

Unmanned aerial vehicle (UAV) image have the characteristics of small object sizes, dense distributions, and complex backgrounds. Existing detection methods perform well under normal circumstances, but perform poorly when processing UAV images. In this paper, we propose MHNet, a small object detection framework for UAV images, which solves the above problems through multi-scale feature processing and more efficient feature fusion. First, we design a multi-scale hierarchical convolution (MHC) module that extracts features at different scales, layer by layer, providing finer-grained feature information and a larger receptive field. Second, we designed the SPPFC module to capture the multi-scale features extracted by the backbone. We introduce contextual anchor attention (CAA) in the SPPFC module to bolster contextual dependency and fortify feature information across various scales, thereby augmenting the semantic information of high-level features. At the same time, this paper uses an auxiliary detection head, combined with a new feature fusion architecture to improve the prediction ability of small objects. The CAA module downsample the input features of the auxiliary detection head to enhance the feature information of the other two detection heads. This design effectively promotes the fusion of high-level and low-level information. Multiple experiments on VisDrone2019 and UAVDT have demonstrated the effectiveness of MHNet. On VisDrone2019, with mAP and mAP50 reaching 28.2% and 45.8%, respectively. Compared with the benchmark, our MHNet improves mAP and mAP50 by 4.8% and 6.7%, respectively.

Downloads

Download data is not yet available.

Author Biographies

Ziyang Xing, Hebei University of Science and Technology

Ziyang Xing was born in 2001 in Handan City, Hebei Province, China. He obtained his bachelor's degree in 2023. Currently, he is a master's student in computer science and technology at Hebei University of Science and Technology (China). His research interests are machine learning and computer vision.

Xuebin Xu, Hebei University of Science and Technology

Xuebin Xu was born in 1999 in Zhoukou City, Henan Province, China. He obtained his bachelor's degree in 2023. Currently, he is a master's student in computer science and technology at Hebei University of Science and Technology (China). His research interests are machine learning and computer vision.

Meiling Sun, Shijiazhuang Preschool Education College

Meiling Sun was born in 1998 in Shijiazhuang City, Hebei Province, China. In 2020, she received her bachelor's degree. In 2023, she received her master's degree from Hebei University of Science and Technology (China). Currently, she is a lecturer at Shijiazhuang Preschool Education College. Her research interests are machine learning and computer vision.

Kuihe Yang, Hebei University of Science and Technology

Kuihe Yang was born in 1966, in Handan, Hebei Province, China. He received the B.S. degree from Tianjin University (China) in 1988, the M.S. degree from University of Science and Technology Beijing (China) in 1997, and the Ph.D degree in computer application technology from Xidian University (China) in 2004. From 2005 to 2007,he was a Postdoctoral Fellow in Army Engineering University of PLA (China). He went to Manchester University (UK) for short-term training in 2011. Currently, He is professor and master tutor with Hebei University of Science and Technology (China). His research interests include database application technology, artificial intelligence and machine learning.

References

X. Liu, C. Wang, and L. Liu, “Research on Pedestrian Detection Model and Compression Technology for UAV Images,” Sensors, vol. 22, no. 23, p. 9171, Nov. 2022, doi: 10.3390/s22239171.

I. Bisio, H. Haleem, C. Garibotto, F. Lavagetto, and A. Sciarrone, “Performance Evaluation and Analysis of Drone-Based Vehicle Detection Techniques From Deep Learning Perspective,” IEEE Internet of Things Journal, vol. 9, no. 13, pp. 10920–10935, Jul. 2022, doi: 10.1109/jiot.2021.3128065.

W. Sun, L. Dai, X. Zhang, P. Chang, and X. He, “RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring,” Applied Intelligence, vol. 52, no. 8, pp. 8448–8463, Oct. 2021, doi: 10.1007/s10489-021-02893-3.

S. H. Alsamhi et al., “UAV Computing-Assisted Search and Rescue Mission Framework for Disaster and Harsh Environment Mitigation,” Drones, vol. 6, no. 7, p. 154, Jun. 2022, doi: 10.3390/drones6070154.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, Jun. 2016, doi: 10.1109/cvpr.2016.91.

J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525, Jul. 2017, doi: 10.1109/cvpr.2017.690.

J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv: 1804.02767, 2018, doi: 10.48550/arXiv.1804.02767.

Z. Ge, S. Liu, F. Wang, Z. Li and J. Sun, "YOLOX: Exceeding YOLO series in 2021," arXiv preprint arXiv:2107.08430, 2021, doi: 10.48550/arXiv.2107.08430.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, et al., "Ssd: Single shot multibox detector", Computer Vision-ECCV 2016: 14th European Conference, pp. 21-37, October 11–14, 2016, doi: https://doi.org/10.1007/978-3-319-46448-0_2.

R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE InternationalConferenceonComputerVision (ICCV), 2015, pp. 1440–1448, doi: 10.1109/ICCV.2015.169.

S. Ren, K. He, R. Girshick, and J. Sun, “FasterR-CNN:Towards realtime object detection with region proposal networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), vol. 39, no. 6, pp. 1137–1149, 2017, doi: 10.1109/TPAMI.2016.2577031.

K. He, G. Gkioxari, P. Dollár, and R. Girshick, “MaskR-CNN,” arXiv preprint arXiv:1703.06870, 2017, doi: 10.48550/arXiv.1703.06870.

X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, “Grid R-CNN,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7355–7364, Jun. 2019, doi: 10.1109/cvpr.2019.00754.

J. Wan, B. Zhang, Y. Zhao, Y. Du, and Z. Tong, “VistrongerDet: Stronger Visual Information for Object Detection in VisDrone Images,” 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 2820–2829, Oct. 2021, doi: 10.1109/iccvw54120.2021.00316.

X. Li, W. Diao, Y. Mao, P. Gao, X. Mao, X. Li, and X. Sun, “OGMN: Occlusion-guided multi-task network for object detection in UAV images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 199, pp. 242–257, May 2023, doi: 10.1016/j.isprsjprs.2023.04.009.

X. Fu, G. Wei, X. Yuan, Y. Liang, and Y. Bo, “Efficient YOLOv7-Drone: An Enhanced Object Detection Approach for Drone Aerial Imagery,” Drones, vol. 7, no. 10, p. 616, Oct. 2023, doi: 10.3390/drones7100616.

Z. Song, L. Wang, G. Zhang, C. Jia, J. Bi, H. Wei, Y. Xia, C. Zhang, and L. Zhao, “Fast Detection of Multi-Direction Remote Sensing Ship Object Based on Scale Space Pyramid,” 2022 18th International Conference on Mobility, Sensing and Networking (MSN), pp. 1019–1024, Dec. 2022, doi: 10.1109/msn57253.2022.00165.

H. Wang, C. Liu, Y. Cai, L. Chen, and Y. Li, “YOLOv8-QSD: An Improved Small Object Detection Algorithm for Autonomous Vehicles Based on YOLOv8,” IEEE Transactions on Instrumentation and Measurement, vol. 73, pp. 1–16, 2024, doi: 10.1109/tim.2024.3379090.

P. Zhang, G. Zhang, and K. Yang, “APNet: Accurate Positioning Deformable Convolution for UAV Image Object Detection,” IEEE Latin America Transactions, vol. 22, no. 4, pp. 304–311, Apr. 2024, doi: 10.1109/tla.2024.10472961.

H. Lou, X. Duan, J. Guo, H. Liu, J. Gu, L. Bi, and H. Chen, “DC-YOLOv8: Small Size Object Detection Algorithm Based on Camera Sensor,” Apr. 2023, doi: 10.20944/preprints202304.0124.v1.

Z. Guo, C. Wang, G. Yang, Z. Huang, and G. Li, “MSFT-YOLO: Improved YOLOv5 Based on Transformer for Detecting Defects of Steel Surface,” Sensors, vol. 22, no. 9, p. 3467, May 2022, doi: 10.3390/s22093467.

Z. Song, Y. Zhang, Y. Liu, K. Yang, and M. Sun, “MSFYOLO: Feature fusion-based detection for small objects,” IEEE Latin America Transactions, vol. 20, no. 5, pp. 823–830, May 2022, doi: 10.1109/tla.2022.9693567.

C.-Y. Wang, H.-Y. Mark Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “CSPNet: A New Backbone that can Enhance Learning Capability of CNN,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun. 2020, doi: 10.1109/cvprw50498.2020.00203.

S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and P. Torr, “Res2Net: A New Multi-Scale Backbone Architecture,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 652–662, Feb. 2021, doi: 10.1109/tpami.2019.2938758.

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLO,” Jan. 2023. [Online]. Available: https://github.com/ultralytics/ultralytics

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, Sep. 2015, doi: 10.1109/tpami.2015.2389824.

X. Cai, Q. Lai, Y. Wang, W. Wang, Z. Sun, and Y. Yao, “Poly Kernel Inception Network for Remote Sensing Detection,” 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 27706–27716, Jun. 2024, doi: 10.1109/cvpr52733.2024.02617.

T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, doi: 10.1109/cvpr.2017.106.

H. Li, P. Xiong, J. An, and L. Wang, “Pyramid Attention Network for Semantic Segmentation,” arXiv preprint arXiv:1805.10180, 2018, doi: 10.48550/arXiv.1805.10180.

D. Du, P. Zhu, L. Wen, X. Bian, H. Lin, Q. Hu, T. Peng, J. Zheng, X. Wang, Y. Zhang, et al., “Visdrone-det2019: The vision meets drone object detection in image challenge results,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, pp. 213–226, doi: 10.1109/ICCVW.2019.00030.

D. Du, et al., “The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking,” Computer Vision – ECCV 2018, pp. 375–391, 2018, doi: 10.1007/978-3-030-01249-6_23.

G. Jocher, “YOLOv5 by ultralytics,” May2020. [Online]. Available: https://github.com/ultralytics/yolov5

W. Yu, T. Yang, and C. Chen, “Towards Resolving the Challenge of Long-tail Distribution in UAV Images for Object Detection,” 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3257–3266, Jan. 2021, doi: 10.1109/wacv48630.2021.00330.

Y. Zhao et al., “DETRs Beat YOLOs on Real-time Object Detection,” 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16965–16974, Jun. 2024, doi: 10.1109/cvpr52733.2024.01605.

Y. Yang, X. Gao, Y. Wang, and S. Song, “VAMYOLOX: An accurate and efficient object detection algorithm based on visual attention mechanism for uav optical sensors,” IEEE Sensors Journal, vol. 23, no. 11, pp. 11139–11155, 2023, doi: 10.1109/jsen.2022.3219199.

F. Yang, H. Fan, P. Chu, E. Blasch, and H. Ling, “Clustered Object Detection in Aerial Images,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8310–8319, Oct. 2019, doi: 10.1109/iccv.2019.00840.

C. Li, T. Yang, S. Zhu, C. Chen, and S. Guan, “Density Map Guided Object Detection in Aerial Images,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 737–746, Jun. 2020, doi: 10.1109/cvprw50498.2020.00103.

X. Yu, Y. Gong, N. Jiang, Q. Ye, and Z. Han, “Scale Match for Tiny Person Detection,” 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Mar. 2020, doi: 10.1109/wacv45572.2020.9093394.

Y. Zeng, T. Zhang, W. He, and Z. Zhang, “YOLOv7-UAV: An Unmanned Aerial Vehicle Image Object Detection Algorithm Based on Improved YOLOv7,” Electronics, vol. 12, no. 14, p. 3141, Jul. 2023, doi: 10.3390/electronics12143141.

Published

2025-08-04

How to Cite

Xing, Z., Xu, X., Sun, M., & Yang, K. (2025). MHNet: Multi-scale Hierarchical Extraction Network for Small Object Detection in UAV Images. IEEE Latin America Transactions, 23(9), 770–777. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/9496