APNet: Accurate Positioning Deformable Convolution for UAV Image Object Detection
object detection, unmanned aerial vehicle (UAV) images, deformable convolution (DC), attention mechanismAbstract
Unmanned aerial vehicle (UAV) image object detection, in recent years, has been receiving increasing attention for its wide application in military and civil fields. Current object detection methods perform well in generic scenarios, while vast small objects and extremely dense distribution in UAV images make it difficult to capture them, resulting in sub-optimal performance. In this paper, we propose a UAV image object detection framework APNet, which addresses the issue mentioned above by fine-grain deformable convolution (DC) and effective feature fusion. First, we design an accurate positioning deformable convolution (APDC), which changes the kernel shape dynamically to enforce refined features, especially in regions where objects gather densely. Specifically, a positional information enhancement attention (PEA) is designed to generate more accurate convolutional position offsets depending on the object position. Therefore, APDC alleviates inflexible deformation in vanilla DC and exhibits better adaptability to the shapes of different objects, which discriminates multi-objects in densely distributed areas in a fine-grain way. Second, we propose an effective cross-layer feature fusion (ECF) to integrate multi-scale features effectively and aggregate attentive features dynamically. Extensive experiments conducted on VisDrone and UAVDT demonstrate the universality and effectiveness of our APNet, achieving 29.8 and 48.7 in mAP and mAP50, respectively. Compared to the state-of-the-art (SOTA) method, our APNet achieves an improvement of 2.2 and 3.5 in mAP and mAP50, respectively.
