GASegNet: Global Self-Attention Mechanism Meets Structural Feature Fusion for Point Cloud Semantic Segmentation

Authors

Keywords:

Autonomous driving, Spherical projection, Point cloud semantic segmentation, Self-attention mechanism

Abstract

With the rapid development of autonomous driving technology, semantic segmentation, as one of the key technologies contributing to the environment perception of autonomous driving systems, still suffers from a lack of connections between local features, as well as high computational consumption and an inability to meet real-time requirements. To address the above problems, this paper proposes a lightweight and efficient point cloud semantic segmentation network based on spherical projection with encoder-decoder structure. The encoder consists of a global self-attention mechanism that captures global information, as well as multi-scale convolution. This module achieves the unification of local feature extraction and global characteristic information for high-dimensional semantic information. In order to alleviate the high computational cost, a feature fusion module is introduced to enhance the compactness of the range image structure obtained from point cloud projection. The decoder utilizes bilinear interpolation to upsample multi-resolution feature maps, and introduces multiple auxiliary segmentation heads to further enhance the network's accuracy. Experiments conducted on the SemanticKITTI and SemanticPOSS datasets reveal that, in comparison to the CENet architecture, the proposed approach attains enhancements in mIoU of 4.3% and 2.6% on the respective datasets, thereby substantiating its efficacy. The code is available at GitHub: https://github.com/haifeng925/GASegNet.

Downloads

Download data is not yet available.

Author Biographies

Xu Lu, Guangdong Polytechnic Normal University

Xu Lu is a Professor in the School of Computer Science, Guangdong Polytechnic Normal University, China. He received the B.S. degrees from Nanchang University, Jiangxi, China, in 2006, and the M.E. and Ph.D. degree from the Guangdong University of Technology, Guangdong, China, in 2009 and 2015, respectively. His research interests include artificial intelligence and smart system.

Haijun Liu, Guangdong Polytechnic Normal University

Haijun Liu is currently pursuing a master's degree in artificial intelligence at the Institute of Interdisciplinary Studies, Guangdong Polytechnic Normal University. His main research directions include robotics and artificial intelligence.

Guang'an Luo, Guangdong Polytechnic Normal University

Guang'an Luo is currently pursuing a master’s degree in New-Generation Electronic Information Technology at the School of Electronics and Information Technology, Guangdong Polytechnic Normal University. His main research directions include the Internet of Things and artificial intelligence.

Zhike Chen, Guangdong Polytechnic Normal University

Zhike Chen is currently pursuing a master’s degree in Control Science and Engineering at the School of Computer Science, Guangdong Polytechnic Normal University. His main research directions include the Internet of Things and machine vision.

Cheng Zhou, Guangdong Polytechnic Normal University

Cheng Zhou is currently pursuing a master’s degree in Control Science and Engineering at the School of Computer Science, Guangdong Polytechnic Normal University. His main research directions include the Internet of Things and artificial intelligence.

Xinyu Wu, Shenzhen Institute of Advanced Technology

Xinyu Wu (Senior Member) received the B.E. and M.E.degrees from the Department of Automation, University of Science and Technology of China, Hefei, China, in 2001 and 2004, respectively. His Ph.D. degree was awarded at the Chinese University of Hong Kong in 2008. He is currently a Professor with Shenzhen Institute of Advanced Technology, Shenzhen, China, the Director of the Center for Intelligent Bionic, and the Director of the Guangdong Provincial Key Lab of Robotics and Intelligent Systems. He has authored or co-authored more than 260 journal and conference papers and 2 monographs. His research interests include wearable robotics, human-machine interaction, and intelligent systems. He received the GaiTech Best Paper in Robotics Award at the IEEE International Conference on Information and Automation (ICIA) in 2018 and the Best Application Paper Award at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) in 2019, among others. Professor Wu has been an Associate Editor of several journals, including IEEE Transactions on Systems Man Cybernetics-Systems, IEEE Transactions on Automation Science and Engineering, and IEEE Robotics and Automation Letters.

Jun Liu, Guangdong Polytechnic Normal University

Jun Liu received the B.S. degree in electronic information engineering in 2009 from Qiqihar University, Qiqihar, China, and the M.S. and Ph.D. degrees in control science and engineering in 2012 and 2015, respectively, from Guangdong University of Technology, Guangzhou, China. He is currently an associate professor at Guangdong Polytechnic Normal University in Guangzhou, China. His research interests mainly include VSLAM and intelligent mobile.

References

J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stach-

niss, and J. Gall, “Semantickitti: A dataset for semantic scene un-

derstanding of lidar sequences,” in Proceedings of the IEEE/CVF

international conference on computer vision, 2019, pp. 9297–9307, doi:

1109/ICCV.2019.00939.

Y. Pan, B. Gao, J. Mei, S. Geng, C. Li, and H. Zhao, “Semanticposs: A

point cloud dataset with large quantity of dynamic instances,” in 2020

IEEE Intelligent Vehicles Symposium (IV). IEEE, 2020, pp. 687–693,

doi: 10.1109/IV47402.2020.9304596.

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on

point sets for 3d classification and segmentation,” in Proceedings of the

IEEE conference on computer vision and pattern recognition, 2017, pp.

–660, doi: 10.1109/CVPR.2017.16.

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hi-

erarchical feature learning on point sets in a metric space,” Ad-

vances in neural information processing systems, vol. 30, 2017, doi:

48550/arXiv.1706.02413.

X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y. Qiao, W. Ouyang, T. He,

and H. Zhao, “Point transformer v3: Simpler faster stronger,” in Proceed-

ings of the IEEE/CVF conference on computer vision and pattern recog-

nition, 2024, pp. 4840–4851, doi: 10.1109/CVPR52733.2024.00463.

P. Veliˇ ckovi´ c, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Ben-

gio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017,

doi: 10.48550/arXiv.1710.10903.

X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “Pointasnl: Ro-

bust point clouds processing using nonlocal neural networks with

adaptive sampling,” in 2020 IEEE/CVF Conference on Computer Vi-

sion and Pattern Recognition (CVPR), 2020, pp. 5588–5597, doi:

1109/CVPR42600.2020.00563.

A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “Rangenet++: Fast and

accurate lidar semantic segmentation,” in 2019 IEEE/RSJ international

conference on intelligent robots and systems (IROS). IEEE, 2019, pp.

–4220, doi: 10.1109/IROS40897.2019.8967762.

C. Xu, B. Wu, Z. Wang, W. Zhan, P. Vajda, K. Keutzer, and

M. Tomizuka, “Squeezesegv3: Spatially-adaptive convolution for effi-

cient point-cloud segmentation,” in Computer Vision–ECCV 2020: 16th

European Conference, Glasgow, UK, August 23–28, 2020, Proceedings,

Part XXVIII 16. Springer, 2020, pp. 1–19, doi: 10.1007/978-3-030-

Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, and

A. Markham, “Randla-net: Efficient semantic segmentation of large-

scale point clouds,” in Proceedings of the IEEE/CVF conference on

computer vision and pattern recognition, 2020, pp. 11 108–11 117, doi:

1109/CVPR42600.2020.01112.

Y. Zhao, L. Bai, and X. Huang, “Fidnet: Lidar point cloud semantic

segmentation with fully interpolation decoding,” in 2021 IEEE/RSJ

International Conference on Intelligent Robots and Systems (IROS).

IEEE, 2021, pp. 4453–4458, doi: 10.1109/IROS51168.2021.9636385.

H.-X. Cheng, X.-F. Han, and G.-Q. Xiao, “Cenet: Toward concise and

efficient lidar semantic segmentation for autonomous driving,” in 2022

IEEE international conference on multimedia and expo (ICME). IEEE,

, pp. 01–06, doi: 10.1109/ICME52920.2022.9859693.

H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and

L. J. Guibas, “Kpconv: Flexible and deformable convolution for point

clouds,” in Proceedings of the IEEE/CVF international conference on

computer vision, 2019, pp. 6411–6420, doi: 10.1109/ICCV.2019.00651.

S. Qiu, S. Anwar, and N. Barnes, “Semantic segmentation for

real point cloud scenes via bilateral augmentation and adaptive

fusion,” in Proceedings of the IEEE/CVF Conference on Com-

puter Vision and Pattern Recognition, 2021, pp. 1757–1767, doi:

1109/CVPR46437.2021.00180.

Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M.

Solomon, “Dynamic graph cnn for learning on point clouds,” ACM

Transactions on Graphics (tog), vol. 38, no. 5, pp. 1–12, 2019, doi:

1145/3326362.

C. Chen, L. Z. Fragonara, and A. Tsourdos, “Gapointnet: Graph

attention based point neural network for exploiting local feature of

point cloud,” Neurocomputing, vol. 438, pp. 122–132, 2021, doi:

1016/j.neucom.2021.01.095.

D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural net-

work for real-time object recognition,” in 2015 IEEE/RSJ international

conference on intelligent robots and systems (IROS). IEEE, 2015, pp.

–928, doi: 10.1109/IROS.2015.7353481.

R. Klokov and V. Lempitsky, “Escape from cells: Deep kd-networks for

the recognition of 3d point cloud models,” in Proceedings of the IEEE

international conference on computer vision, 2017, pp. 863–872, doi:

1109/ICCV.2017.99.

H. Su, V. Jampani, D. Sun, S. Maji, E. Kalogerakis, M.-H. Yang, and

J. Kautz, “Splatnet: Sparse lattice networks for point cloud processing,”

in Proceedings of the IEEE conference on computer vision and pattern

recognition, 2018, pp. 2530–2539, doi: 10.1109/CVPR.2018.00268.

X. Zhu, H. Zhou, T. Wang, F. Hong, Y. Ma, W. Li, H. Li, and

D. Lin, “Cylindrical and asymmetrical 3d convolution networks for

lidar segmentation,” in Proceedings of the IEEE/CVF conference on

computer vision and pattern recognition, 2021, pp. 9939–9948, doi:

1109/TPAMI.2021.3098789.

R. Cheng, R. Razani, E. Taghavi, E. Li, and B. Liu, “2-s3net: Atten-

tive feature fusion with adaptive feature selection for sparse semantic

segmentation network,” in Proceedings of the IEEE/CVF conference on

computer vision and pattern recognition, 2021, pp. 12 547–12 556, doi:

1109/CVPR46437.2021.01236.

B. Wu, A. Wan, X. Yue, and K. Keutzer, “Squeezeseg: Convolutional

neural nets with recurrent crf for real-time road-object segmentation

from 3d lidar point cloud,” in 2018 IEEE international conference on

robotics and automation (ICRA). IEEE, 2018, pp. 1887–1893, doi:

1109/ICRA.2018.8462926.

B. Wu, X. Zhou, S. Zhao, X. Yue, and K. Keutzer, “Squeezesegv2:

Improved model structure and unsupervised domain adaptation for road-

object segmentation from a lidar point cloud,” in 2019 international

conference on robotics and automation (ICRA). IEEE, 2019, pp. 4376–

, doi: 10.1109/ICRA.2019.8793495.

T. Cortinhal, G. Tzelepis, and E. Erdal Aksoy, “Salsanext: Fast,

uncertainty-aware semantic segmentation of lidar point clouds,” in

Advances in Visual Computing: 15th International Symposium, ISVC

, San Diego, CA, USA, October 5–7, 2020, Proceedings, Part II

Springer, 2020, pp. 207–222, doi: 10.1007/978-3-030-64559.

E. E. Aksoy, S. Baci, and S. Cavdar, “Salsanet: Fast road and vehicle

segmentation in lidar point clouds for autonomous driving,” in 2020

IEEE intelligent vehicles symposium (IV). IEEE, 2020, pp. 926–932,

doi: 10.1109/IV47402.2020.9304694.

D. Kochanov, F. K. Nejadasl, and O. Booij, “Kprnet: Improv-

ing projection-based lidar semantic segmentation,” arXiv preprint

arXiv:2007.12668, 2020, doi: 10.48550/arXiv.2007.12668.

R. Razani, R. Cheng, E. Taghavi, and L. Bingbing, “Lite-hdseg: Lidar

semantic segmentation using lite harmonic dense convolutions,” in 2021

IEEE International Conference on Robotics and Automation (ICRA).

IEEE, 2021, pp. 9550–9556, doi: 10.1109/ICRA48506.2021.9561171.

J. Li, Y. Wen, and L. He, “Scconv: Spatial and channel reconstruction

convolution for feature redundancy,” in Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern Recognition, 2023, pp.

–6162, doi: 10.1109/CVPR52729.2023.00596.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification

with deep convolutional neural networks,” Advances in neural informa-

tion processing systems, vol. 25, 2012, doi: 10.1145/3065386.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,

V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”

in Proceedings of the IEEE conference on computer vision and pattern

recognition, 2015, pp. 1–9, doi: 10.1109/cvpr.2015.7298594.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,

T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo-

lutional neural networks for mobile vision applications,” arXiv preprint

arXiv:1704.04861, 2017, doi: 10.48550/arXiv.1704.04861.

S. Wang, J. Zhu, and R. Zhang, “Meta-rangeseg: Lidar sequence se-

mantic segmentation using multiple feature aggregation,” IEEE Robotics

and Automation Letters, vol. 7, no. 4, pp. 9739–9746, 2022, doi:

1109/LRA.2022.3191040.

A. Athar, E. Li, S. Casas, and R. Urtasun, “4d-former: Multimodal 4d

panoptic segmentation,” in Conference on Robot Learning. PMLR,

, pp. 2151–2164, doi: 10.48550/ARXIV.2311.01520.

D. Ye, W. Chen, Z. Zhou, Y. Xie, Y. Wang, P. Wang, and H. Foroosh,

“Lidarmultinet: Unifying lidar semantic segmentation, 3d object detec-

tion, and panoptic segmentation in a single multi-task network,” arXiv

preprint arXiv:2206.11428, 2022, doi: 10.48550/arXiv.2209.09385.

Y. Liu, R. Chen, X. Li, L. Kong, Y. Yang, Z. Xia, Y. Bai, X. Zhu,

Y. Ma, Y. Li et al., “Uniseg: A unified multi-modal lidar segmentation

network and the openpcseg codebase,” in Proceedings of the IEEE/CVF

International Conference on Computer Vision, 2023, pp. 21 662–21 673,

doi: 10.1109/ICCV51070.2023.01980.

I. Loshchilov and F. Hutter, “Decoupled weight decay

regularization,” arXiv preprint arXiv:1711.05101, 2017, doi:

48550/arXiv.1711.05101.

S. Li, X. Chen, Y. Liu, D. Dai, C. Stachniss, and J. Gall, “Multi-

scale interaction for real-time lidar data segmentation on an embedded

platform,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp.

–745, 2021, doi: 10.1109/LRA.2021.3132059.

B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmen-

tation with submanifold sparse convolutional networks,” in Proceedings

of the IEEE conference on computer vision and pattern recognition,

, pp. 9224–9232, doi: 10.1109/CVPR.2018.00961.

H. Tang, Z. Liu, S. Zhao, Y. Lin, J. Lin, H. Wang, and S. Han, “Search-

ing efficient 3d architectures with sparse point-voxel convolution,” in

European conference on computer vision. Springer, 2020, pp. 685–

, doi: 10.1007/978-3-030-58604.

Y. A. Alnaggar, M. Afifi, K. Amer, and M. ElHelw, “Multi

projection fusion for real-time semantic segmentation of 3d lidar

point clouds,” in Proceedings of the IEEE/CVF winter conference

on applications of computer vision, 2021, pp. 1800–1809, doi:

1109/WACV48630.2021.00184.

Published

2026-04-09

How to Cite

Lu, X., Liu, H., Luo, G., Chen, Z., Zhou, C., Wu, X., & Liu, J. (2026). GASegNet: Global Self-Attention Mechanism Meets Structural Feature Fusion for Point Cloud Semantic Segmentation. IEEE Latin America Transactions, 24(5), 484–493. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/10124