GASegNet: Global Self-Attention Mechanism Meets Structural Feature Fusion for Point Cloud Semantic Segmentation
Keywords:
Autonomous driving, Spherical projection, Point cloud semantic segmentation, Self-attention mechanismAbstract
With the rapid development of autonomous driving technology, semantic segmentation, as one of the key technologies contributing to the environment perception of autonomous driving systems, still suffers from a lack of connections between local features, as well as high computational consumption and an inability to meet real-time requirements. To address the above problems, this paper proposes a lightweight and efficient point cloud semantic segmentation network based on spherical projection with encoder-decoder structure. The encoder consists of a global self-attention mechanism that captures global information, as well as multi-scale convolution. This module achieves the unification of local feature extraction and global characteristic information for high-dimensional semantic information. In order to alleviate the high computational cost, a feature fusion module is introduced to enhance the compactness of the range image structure obtained from point cloud projection. The decoder utilizes bilinear interpolation to upsample multi-resolution feature maps, and introduces multiple auxiliary segmentation heads to further enhance the network's accuracy. Experiments conducted on the SemanticKITTI and SemanticPOSS datasets reveal that, in comparison to the CENet architecture, the proposed approach attains enhancements in mIoU of 4.3% and 2.6% on the respective datasets, thereby substantiating its efficacy. The code is available at GitHub: https://github.com/haifeng925/GASegNet.
Downloads
References
J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stach-
niss, and J. Gall, “Semantickitti: A dataset for semantic scene un-
derstanding of lidar sequences,” in Proceedings of the IEEE/CVF
international conference on computer vision, 2019, pp. 9297–9307, doi:
1109/ICCV.2019.00939.
Y. Pan, B. Gao, J. Mei, S. Geng, C. Li, and H. Zhao, “Semanticposs: A
point cloud dataset with large quantity of dynamic instances,” in 2020
IEEE Intelligent Vehicles Symposium (IV). IEEE, 2020, pp. 687–693,
doi: 10.1109/IV47402.2020.9304596.
C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on
point sets for 3d classification and segmentation,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2017, pp.
–660, doi: 10.1109/CVPR.2017.16.
C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hi-
erarchical feature learning on point sets in a metric space,” Ad-
vances in neural information processing systems, vol. 30, 2017, doi:
48550/arXiv.1706.02413.
X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y. Qiao, W. Ouyang, T. He,
and H. Zhao, “Point transformer v3: Simpler faster stronger,” in Proceed-
ings of the IEEE/CVF conference on computer vision and pattern recog-
nition, 2024, pp. 4840–4851, doi: 10.1109/CVPR52733.2024.00463.
P. Veliˇ ckovi´ c, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Ben-
gio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017,
doi: 10.48550/arXiv.1710.10903.
X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “Pointasnl: Ro-
bust point clouds processing using nonlocal neural networks with
adaptive sampling,” in 2020 IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR), 2020, pp. 5588–5597, doi:
1109/CVPR42600.2020.00563.
A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “Rangenet++: Fast and
accurate lidar semantic segmentation,” in 2019 IEEE/RSJ international
conference on intelligent robots and systems (IROS). IEEE, 2019, pp.
–4220, doi: 10.1109/IROS40897.2019.8967762.
C. Xu, B. Wu, Z. Wang, W. Zhan, P. Vajda, K. Keutzer, and
M. Tomizuka, “Squeezesegv3: Spatially-adaptive convolution for effi-
cient point-cloud segmentation,” in Computer Vision–ECCV 2020: 16th
European Conference, Glasgow, UK, August 23–28, 2020, Proceedings,
Part XXVIII 16. Springer, 2020, pp. 1–19, doi: 10.1007/978-3-030-
Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, and
A. Markham, “Randla-net: Efficient semantic segmentation of large-
scale point clouds,” in Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, 2020, pp. 11 108–11 117, doi:
1109/CVPR42600.2020.01112.
Y. Zhao, L. Bai, and X. Huang, “Fidnet: Lidar point cloud semantic
segmentation with fully interpolation decoding,” in 2021 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS).
IEEE, 2021, pp. 4453–4458, doi: 10.1109/IROS51168.2021.9636385.
H.-X. Cheng, X.-F. Han, and G.-Q. Xiao, “Cenet: Toward concise and
efficient lidar semantic segmentation for autonomous driving,” in 2022
IEEE international conference on multimedia and expo (ICME). IEEE,
, pp. 01–06, doi: 10.1109/ICME52920.2022.9859693.
H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and
L. J. Guibas, “Kpconv: Flexible and deformable convolution for point
clouds,” in Proceedings of the IEEE/CVF international conference on
computer vision, 2019, pp. 6411–6420, doi: 10.1109/ICCV.2019.00651.
S. Qiu, S. Anwar, and N. Barnes, “Semantic segmentation for
real point cloud scenes via bilateral augmentation and adaptive
fusion,” in Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, 2021, pp. 1757–1767, doi:
1109/CVPR46437.2021.00180.
Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M.
Solomon, “Dynamic graph cnn for learning on point clouds,” ACM
Transactions on Graphics (tog), vol. 38, no. 5, pp. 1–12, 2019, doi:
1145/3326362.
C. Chen, L. Z. Fragonara, and A. Tsourdos, “Gapointnet: Graph
attention based point neural network for exploiting local feature of
point cloud,” Neurocomputing, vol. 438, pp. 122–132, 2021, doi:
1016/j.neucom.2021.01.095.
D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural net-
work for real-time object recognition,” in 2015 IEEE/RSJ international
conference on intelligent robots and systems (IROS). IEEE, 2015, pp.
–928, doi: 10.1109/IROS.2015.7353481.
R. Klokov and V. Lempitsky, “Escape from cells: Deep kd-networks for
the recognition of 3d point cloud models,” in Proceedings of the IEEE
international conference on computer vision, 2017, pp. 863–872, doi:
1109/ICCV.2017.99.
H. Su, V. Jampani, D. Sun, S. Maji, E. Kalogerakis, M.-H. Yang, and
J. Kautz, “Splatnet: Sparse lattice networks for point cloud processing,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2018, pp. 2530–2539, doi: 10.1109/CVPR.2018.00268.
X. Zhu, H. Zhou, T. Wang, F. Hong, Y. Ma, W. Li, H. Li, and
D. Lin, “Cylindrical and asymmetrical 3d convolution networks for
lidar segmentation,” in Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, 2021, pp. 9939–9948, doi:
1109/TPAMI.2021.3098789.
R. Cheng, R. Razani, E. Taghavi, E. Li, and B. Liu, “2-s3net: Atten-
tive feature fusion with adaptive feature selection for sparse semantic
segmentation network,” in Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, 2021, pp. 12 547–12 556, doi:
1109/CVPR46437.2021.01236.
B. Wu, A. Wan, X. Yue, and K. Keutzer, “Squeezeseg: Convolutional
neural nets with recurrent crf for real-time road-object segmentation
from 3d lidar point cloud,” in 2018 IEEE international conference on
robotics and automation (ICRA). IEEE, 2018, pp. 1887–1893, doi:
1109/ICRA.2018.8462926.
B. Wu, X. Zhou, S. Zhao, X. Yue, and K. Keutzer, “Squeezesegv2:
Improved model structure and unsupervised domain adaptation for road-
object segmentation from a lidar point cloud,” in 2019 international
conference on robotics and automation (ICRA). IEEE, 2019, pp. 4376–
, doi: 10.1109/ICRA.2019.8793495.
T. Cortinhal, G. Tzelepis, and E. Erdal Aksoy, “Salsanext: Fast,
uncertainty-aware semantic segmentation of lidar point clouds,” in
Advances in Visual Computing: 15th International Symposium, ISVC
, San Diego, CA, USA, October 5–7, 2020, Proceedings, Part II
Springer, 2020, pp. 207–222, doi: 10.1007/978-3-030-64559.
E. E. Aksoy, S. Baci, and S. Cavdar, “Salsanet: Fast road and vehicle
segmentation in lidar point clouds for autonomous driving,” in 2020
IEEE intelligent vehicles symposium (IV). IEEE, 2020, pp. 926–932,
doi: 10.1109/IV47402.2020.9304694.
D. Kochanov, F. K. Nejadasl, and O. Booij, “Kprnet: Improv-
ing projection-based lidar semantic segmentation,” arXiv preprint
arXiv:2007.12668, 2020, doi: 10.48550/arXiv.2007.12668.
R. Razani, R. Cheng, E. Taghavi, and L. Bingbing, “Lite-hdseg: Lidar
semantic segmentation using lite harmonic dense convolutions,” in 2021
IEEE International Conference on Robotics and Automation (ICRA).
IEEE, 2021, pp. 9550–9556, doi: 10.1109/ICRA48506.2021.9561171.
J. Li, Y. Wen, and L. He, “Scconv: Spatial and channel reconstruction
convolution for feature redundancy,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 2023, pp.
–6162, doi: 10.1109/CVPR52729.2023.00596.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” Advances in neural informa-
tion processing systems, vol. 25, 2012, doi: 10.1145/3065386.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2015, pp. 1–9, doi: 10.1109/cvpr.2015.7298594.
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo-
lutional neural networks for mobile vision applications,” arXiv preprint
arXiv:1704.04861, 2017, doi: 10.48550/arXiv.1704.04861.
S. Wang, J. Zhu, and R. Zhang, “Meta-rangeseg: Lidar sequence se-
mantic segmentation using multiple feature aggregation,” IEEE Robotics
and Automation Letters, vol. 7, no. 4, pp. 9739–9746, 2022, doi:
1109/LRA.2022.3191040.
A. Athar, E. Li, S. Casas, and R. Urtasun, “4d-former: Multimodal 4d
panoptic segmentation,” in Conference on Robot Learning. PMLR,
, pp. 2151–2164, doi: 10.48550/ARXIV.2311.01520.
D. Ye, W. Chen, Z. Zhou, Y. Xie, Y. Wang, P. Wang, and H. Foroosh,
“Lidarmultinet: Unifying lidar semantic segmentation, 3d object detec-
tion, and panoptic segmentation in a single multi-task network,” arXiv
preprint arXiv:2206.11428, 2022, doi: 10.48550/arXiv.2209.09385.
Y. Liu, R. Chen, X. Li, L. Kong, Y. Yang, Z. Xia, Y. Bai, X. Zhu,
Y. Ma, Y. Li et al., “Uniseg: A unified multi-modal lidar segmentation
network and the openpcseg codebase,” in Proceedings of the IEEE/CVF
International Conference on Computer Vision, 2023, pp. 21 662–21 673,
doi: 10.1109/ICCV51070.2023.01980.
I. Loshchilov and F. Hutter, “Decoupled weight decay
regularization,” arXiv preprint arXiv:1711.05101, 2017, doi:
48550/arXiv.1711.05101.
S. Li, X. Chen, Y. Liu, D. Dai, C. Stachniss, and J. Gall, “Multi-
scale interaction for real-time lidar data segmentation on an embedded
platform,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp.
–745, 2021, doi: 10.1109/LRA.2021.3132059.
B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmen-
tation with submanifold sparse convolutional networks,” in Proceedings
of the IEEE conference on computer vision and pattern recognition,
, pp. 9224–9232, doi: 10.1109/CVPR.2018.00961.
H. Tang, Z. Liu, S. Zhao, Y. Lin, J. Lin, H. Wang, and S. Han, “Search-
ing efficient 3d architectures with sparse point-voxel convolution,” in
European conference on computer vision. Springer, 2020, pp. 685–
, doi: 10.1007/978-3-030-58604.
Y. A. Alnaggar, M. Afifi, K. Amer, and M. ElHelw, “Multi
projection fusion for real-time semantic segmentation of 3d lidar
point clouds,” in Proceedings of the IEEE/CVF winter conference
on applications of computer vision, 2021, pp. 1800–1809, doi:
1109/WACV48630.2021.00184.