Code Smell Detection Research Based on Pre-training and Stacking Models

Authors

  • Dongwen Zhang The School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei, China, 050018;Hebei Technology Innova-tion Center of Intelligent IoT, Shijiazhuang, Hebei 050018, China.
  • Shuai Song The School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei, China, 050018. https://orcid.org/0009-0006-0126-5299
  • Yang Zhang The School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei, China, 050018;Hebei Technology Innova-tion Center of Intelligent IoT, Shijiazhuang, Hebei 050018, China. https://orcid.org/0000-0001-8641-2660
  • Haiyang Liu The School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei, China, 050018. https://orcid.org/0009-0007-5754-5043
  • Gaojie Shen The School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei, China, 050018. https://orcid.org/0009-0000-5598-6485

Keywords:

Code Smell, Pre-training Model, Textual Features, Stacking Model

Abstract

Code smells detection primarily adopts heuristic-based, machine learning, and deep learning approaches, However, to enhance accuracy, most studies employ deep learning methods, but the value of traditional machine learning methods should not be underestimated. Additionally, existing code smells detection methods do not pay sufficient attention to the textual features in the code. To address this issue, this paper proposes a code smell detection method, SCSmell, which utilizes static analysis tools to extract structure features, then transforms the code into txt format using static analysis tools , and inputs it into the BERT pre-training model to extract textual features. The structure features are combined with the textual features to generate sample data and label code smells instances. The REFCV method is then used to filter important structure features. To deal with the issue of data imbalance, the Borderline-SMOTE method is used to generate positive sample data, and a three-layer Stacking model is ultimately employed to detect code smells. In our experiment, we select 44 large actual projects programs as the training and testing sets and conducted smell detection for four types of code smells: brain class, data class, God class, and brain method. The experimental results indicate that the SCSmell method improves the average accuracy by 10.38% compared to existing detection methods, while maintaining high precision, recall, and F1 scores. The SCSmell method is an effective solution for implementing code smells detection.

Downloads

Download data is not yet available.

Author Biographies

Dongwen Zhang, The School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei, China, 050018;Hebei Technology Innova-tion Center of Intelligent IoT, Shijiazhuang, Hebei 050018, China.

DONGWEN ZHANG received the Ph.D degree from the Department of Automation Control in Beijing Institute of Technology. She is currently a professor with the School of Information Science and Engineering, Hebei University of Science and Technology. Her research interests include software refactoring for parallelism and parallel programming.

Shuai Song, The School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei, China, 050018.

SHUAI SONG is currently pursuing his master’s degree in the School of Information Science and Engineering, Hebei University of Science and Technology. His research interests include parallel programming and software refactoring.

Yang Zhang, The School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei, China, 050018;Hebei Technology Innova-tion Center of Intelligent IoT, Shijiazhuang, Hebei 050018, China.

YANG ZHANG received the Ph.D degree from the School of Computer, Beijing Institute of Technology He is currently a professor with the School of Information Science and Engineering, Hebei University of Science and Technology. His research interests include software refactoring for parallelism and parallel programming.

Haiyang Liu, The School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei, China, 050018.

HAIYANG LIU is currently pursuing his master’s degree in the School of Information Science and Engineering, Hebei University of Science and Technology. His research interests include software testing and software refactoring.

Gaojie Shen, The School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei, China, 050018.

GAOJIE SHEN Obtained a bachelor’s degree in electronic information from Anhui University of Technology, and now studies computer science at Hebei University of Science and Technology. The research direction is software refactoring and parallel programming design.

References

F. Palomba, G. Bavota, M. Di Penta, F. Fasano, R. Oliveto, and A. De Lucia, “On the diffuseness and the impact on maintainability of code smells: A large scale empirical investigation,” in Proceedings of the 40th International Conference on Software Engineering, 2018, pp. 482–482.

N. Moha, Y.-G. Guéhéneuc, L. Duchien, and A.-F. Le Meur, “Decor: A method for the specification and detection of code and design smells,” IEEE Transactions on Software Engineering, vol. 36, no. 1, pp. 20–36, 2009.

M. Fokaefs, N. Tsantalis, E. Stroulia, and A. Chatzigeorgiou, “Jdeodorant: Identification and application of extract class refactorings,” in Proceedings of the 33rd International Conference on Software Engineering, 2011, pp. 1037–1039.

R. Marinescu, “Measurement and quality in objectoriented design,” in 21st IEEE International Conference on Software Maintenance (ICSM’05), IEEE, 2005, pp. 701–704.

F. Arcelli Fontana, M. V. Mäntylä, M. Zanoni, and A. Marino, “Comparing and experimenting machine learning techniques for code smell detection,” Empirical Software Engineering, vol. 21, pp. 1143–1191, 2016.

F. Pecorelli, F. Palomba, D. Di Nucci, and A. De Lucia, “Comparing heuristic and machine learning approaches for metric-based code smell detection,” in 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), IEEE, 2019, pp. 93–104.

S. Wang, Y. Zhang, and J. Sun, “Detection of bad smell in code based on bp neural network,” Computer Engineering, vol. 46, no. 10, pp. 216–222, 2020.

H. Liu, Z. Xu, and Y. Zou, “Deep learning based feature envy detection,” in Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, 2018, pp. 385–396.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.

D. H. Wolpert, “Stacked generalization,” Neural networks, vol. 5, no. 2, pp. 241–259, 1992.

H. Han, W.-Y. Wang, and B.-H. Mao, “Borderlinesmote: A new over-sampling method in imbalanced datasets learning,” in International conference on intelligent computing, Springer, 2005, pp. 878–887.

S.-i. Amari and S. Wu, “Improving support vector machine classifiers by modifying kernel functions,” Neural Networks, vol. 12, no. 6, pp. 783–789, 1999.

C. M. Bishop and N. M. Nasrabadi, Pattern recognition and machine learning. Springer, 2006, vol. 4.

T. Łuczak and B. Pittel, “Components of random forests,” Combinatorics, Probability and Computing, vol. 1, no. 1, pp. 35–52, 1992.

Y. Liao and V. R. Vemuri, “Use of k-nearest neighbor classifier for intrusion detection,” Computers & security, vol. 21, no. 5, pp. 439–448, 2002.

Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of computer and system sciences, vol. 55, no. 1, pp. 119–139, 1997.

S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE transactions on systems, man, and cybernetics, vol. 21, no. 3, pp. 660–674, 1991.

M. Lanza and R. Marinescu, Object-oriented metrics in practice: using software metrics to characterize, evaluate, and improve the design of object-oriented systems. Springer Science & Business Media, 2007.

K. Cho, B. Van Merriënboer, C. Gulcehre, et al., “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

H. Guo, R. Tang, Y. Ye, Z. Li, and X. He, “Deepfm: A factorization-machine based neural network for ctr prediction,” arXiv preprint arXiv:1703.04247, 2017.

Z. Y, D. CH, L. H, and G. CY, “Code smell detection approach based on pre-training model and multilevel information,” Journal of Software, vol. 33, no. 5, p. 1551, May 2022.

Y. Zhang, C. Ge, S. Hong, R. Tian, C. Dong, and J. Liu, “Delesmell: Code smell detection based on deep learning and latent semantic analysis,” Knowledge Based Systems, vol. 255, pp. 109–737, 2022.

Published

2023-12-21

How to Cite

Zhang, D., Song, S., Zhang, Y., Liu, H., & Shen, G. (2023). Code Smell Detection Research Based on Pre-training and Stacking Models. IEEE Latin America Transactions, 22(1), 22–30. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/8379