APNet: Accurate Positioning Deformable Convolution for UAV Image Object Detection

Authors

Keywords:

object detection, unmanned aerial vehicle (UAV) images, deformable convolution (DC), attention mechanism

Abstract

Unmanned aerial vehicle (UAV) image object detection, in recent years, has been receiving increasing attention for its wide application in military and civil fields. Current object detection methods perform well in generic scenarios, while vast small objects and extremely dense distribution in UAV images make it difficult to capture them, resulting in sub-optimal performance. In this paper, we propose a UAV image object detection framework APNet, which addresses the issue mentioned above by fine-grain deformable convolution (DC) and effective feature fusion. First, we design an accurate positioning deformable convolution (APDC), which changes the kernel shape dynamically to enforce refined features, especially in regions where objects gather densely. Specifically, a positional information enhancement attention (PEA) is designed to generate more accurate convolutional position offsets depending on the object position. Therefore, APDC alleviates inflexible deformation in vanilla DC and exhibits better adaptability to the shapes of different objects, which discriminates multi-objects in densely distributed areas in a fine-grain way. Second, we propose an effective cross-layer feature fusion (ECF) to integrate multi-scale features effectively and aggregate attentive features dynamically. Extensive experiments conducted on VisDrone and UAVDT demonstrate the universality and effectiveness of our APNet, achieving 29.8 and 48.7 in mAP and mAP50, respectively. Compared to the state-of-the-art (SOTA) method, our APNet achieves an improvement of 2.2 and 3.5 in mAP and mAP50, respectively.

Downloads

Download data is not yet available.

Author Biographies

Peiran Zhang, Hebei University of Science and Technology

Peiran Zhang was born in Zhengzhou, Henan Province, China, in 1999. In 2022, he received his Bachelor’s degree. He is now a master’s student majoring in Computer Science and Technology at the Hebei University of Science and Technology (China). His research interest is computer vision.

Guoxin Zhang, Hebei University of Science and Technology

Guoxin Zhang was born in 1998 in Xingtai, Hebei Province, China. In 2021, he received his Bachelor’s degree. He is now studying for his master’s degree at the Hebei University of Science and Technology (China). His research interest is computer vision.

Kuihe Yang, Hebei University Of Science and Technology

Kuihe Yang was born in 1966, in Handan, Hebei Province, China. He received the B.S. degree from Tianjin University (China) in 1988, the M.S. degree from University of Science and Technology Beijing (China) in 1997, and the Ph.D degree in computer application technology from Xidian University (China) in 2004. From 2005 to 2007, he was a Postdoctoral Fellow in Army Engineering University of PLA (China). He went to Manchester University (UK) for short-term training in 2011. Currently, He is professor and master tutor with Hebei University of Science and Technology (China). His research interests include database application technology, artificial intelligence and machine learning.

References

R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448, doi:10.1109/ICCV.2015.169.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 39, no. 6, pp. 1137–1149, 2017, doi:10.1109/TPAMI.2016.2577031.

Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6154–6162, doi:10.1109/cvpr.2018.00644.

J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018, doi:10.48550/arXiv.1804.02767.

G. Jocher, “YOLOv5 by ultralytics,” May 2020. [Online]. Available: https://github.com/ultralytics/yolov5

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLO,” Jan. 2023. [Online]. Available: https://github.com/ultralytics/ultralytics

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2999–3007, doi:10.1109/iccv.2017.324.

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End object detection with transformers,” in Proceedings of the European conference on computer vision (ECCV), 2020, pp. 213–229, doi:10.1007/978-3-030-58452-8_13.

A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008, doi:10.48550/arXiv.1706.03762.

Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,” Proceedings of the IEEE, vol. 111, no. 3, pp. 257–276, 2023, doi:10.1109/jproc.2023.3238524.

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020, doi:10.48550/arXiv.2010.04159.

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum, “DINO: Detr with improved denoising anchor boxes for end-to-end object detection,” arXiv preprint arXiv:2203.03605, 2022, doi:10.48550/arXiv.2203.03605.

W. Lv, S. Xu, Y. Zhao, G. Wang, J. Wei, C. Cui, Y. Du, Q. Dang, and Y. Liu, “Detrs beat yolos on real-time object detection,” arXiv preprint arXiv:2304.08069, 2023, doi:10.48550/arXiv.2304.08069.

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015, doi:10.1038/nature14539.

B. Du, Y. Huang, J. Chen, and D. Huang, “Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 13 435–13 444, doi:10.1109/cvpr52729.2023.01291.

F. Yang, H. Fan, P. Chu, E. Blasch, and H. Ling, “Clustered object detection in aerial images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 8310–8319, doi:10.1109/iccv.2019.00840.

W. Yu, T. Yang, and C. Chen, “Towards resolving the challenge of long-tail distribution in uav images for object detection,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 3257–3266, doi:10.1109/wacv48630.2021.00330.

Z. Liu, G. Gao, L. Sun, and Z. Fang, “HRDNet: High-resolution detection network for small objects,” in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), 2021, pp. 1–6, doi:10.1109/icme51207.2021.9428241.

C. Li, T. Yang, S. Zhu, C. Chen, and S. Guan, “Density map guided object detection in aerial images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 737–746, doi:10.1109/cvprw50498.2020.00103.

C. Yang, Z. Huang, and N. Wang, “QueryDet: Cascaded sparse query for accelerating high-resolution small object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 13 658–13 667, doi:10.1109/cvpr52688.2022.01330.

J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 764–773, doi:10.1109/iccv.2017.89.

X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9300–9308, doi:10.1109/cvpr.2019.00953.

H. Zhang, C. Xu, and S. Zhang, “Inner-IoU: More effective intersection over union loss with auxiliary bounding box,” arXiv preprint arXiv:2311.02877, 2023, doi:10.48550/arXiv.2311.02877.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778, doi:10.1109/cvpr.2016.90.

Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13 708–13 717, doi:10.1109/cvpr46437.2021.01350.

T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936–944, doi:10.1109/cvpr.2017.106.

X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “RepVGG: Making vgg-style convnets great again,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13 728–13 737, doi:10.1109/cvpr46437.2021.01352.

D. Ouyang, S. He, G. Zhang, M. Luo, H. Guo, J. Zhan, and Z. Huang, “Efficient multi-scale attention module with cross-spatial learning,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5, doi:10.1109/icassp49357.2023.10096516.

H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 658–666, doi:10.1109/cvpr.2019.00075.

Y. Cao, Z. He, L. Wang, W. Wang, Y. Yuan, D. Zhang, J. Zhang, P. Zhu, L. Van Gool, J. Han et al., “VisDrone-DET2021: The vision meets drone object detection challenge results,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, pp. 2847–2854, doi:10.1109/iccvw54120.2021.00319.

D. Du, Y. Qi, H. Yu, Y. Yang, K. Duan, G. Li, W. Zhang, Q. Huang, and Q. Tian, “The unmanned aerial vehicle benchmark: Object detection and tracking,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 375–391, doi:10.1007/978-3-030-01249-6_23.

Y. Yang, X. Gao, Y. Wang, and S. Song, “VAMYOLOX: An accurate and efficient object detection algorithm based on visual attention mechanism for uav optical sensors,” IEEE Sensors Journal, vol. 23, no. 11, pp. 11 139–11 155, 2023, doi:10.1109/jsen.2022.3219199.

S. Deng, S. Li, K. Xie, W. Song, X. Liao, A. Hao, and H. Qin, “A global-local self-adaptive network for drone-view object detection,” IEEE Transactions on Image Processing, vol. 30, pp. 1556–1569, 2021, doi:10.1109/tip.2020.3045636.

Published

2024-03-13

How to Cite

Zhang, P., Zhang, G., & Yang, K. (2024). APNet: Accurate Positioning Deformable Convolution for UAV Image Object Detection. IEEE Latin America Transactions, 22(4), 304–311. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/8716