Investigation and Optimization of StringDeduplication with Custom Heuristic in Different Versions of the JVM

Authors

Keywords:

StringDeduplication, JVM Performance, Memory Optimization, Heuristics, Native Code Integration

Abstract

Memory optimization in Java applications is essential for performance and scalability. This paper investigates the efficiency of the StringDeduplication parameter in JVM versions 11, 17, and 21, using a Web Crawler developed in Spring Boot. The results show that the efficiency of StringDeduplication decreased from 34.3% deduplication in version 11 to 3.4% in version 21, with an increase in deduplication time from 1,264 ms to 3,439 ms. To mitigate this problem, a custom solution in C was developed for JVM version 21, which increased deduplication to 31.1% and saved 110.2 MB of memory. The main scientific contribution of this work is the identification of the loss of efficiency of StringDeduplication in the latest JVM versions and the proposal of a custom solution that improves string deduplication, offering a viable alternative for developers and software engineers.

Downloads

Download data is not yet available.

Author Biographies

Darlan Noetzold, Instituto Federal de Educação, Ciência e Tecnologia Sul-Rio-Grandense

D. Noetzold is doing his master's degree at the University of Vale do Rio Sinos (UNISINOS), São Leopoldo, Brazil. He received the degree Bachelor's degree in Computer Science from the Federal Institute Sul-rio-grandense (IFSUL), Passo Fundo, Brazil in 2023. He also works as a Software Developer at CWI Software.

Anubis Graciela de Moraes Rossetto, Instituto Federal Sul-rio-grandense, Brazil

Anubis G. D. M. Rossetto received a Ph.D. degree in Computer Science from the Federal University of Rio Grande do Sul UFRGS/RS, in 2016. Master in Computer Science from the Federal University of Santa Catarina (2007). She is currently a Professor at the Federal Institute Sul-rio-grandense Câmpus Passo Fundo. She maintains cooperation with other research groups in Brazil, France and Portugal. Her main line of research is in distributed systems, mobile computing, internet of things and technologies in education.

Jorge Barbosa, University of Vale do Rio dos Sinos (UNISINOS), São Leopoldo, Brazil

Jorge Barbosa received M.Sc. and Ph.D. in computer science from the Federal University of Rio Grande do Sul, Brazil. He conducted post-doctoral studies at Sungkyunkwan University (SKKU, Suwon, South Korea) and University of California Irvine (UCI, Irvine, USA). Jorge is a full professor at the Applied Computing Graduate Program (PPGCA) of the University of Vale do Rio dos Sinos (UNISINOS), head of the university’s Mobile Computing Lab (MOBILAB), and a researcher at the Brazilian Council for Scientific and Technological Development (CNPq). His main research interests are Ubiquitous Computing, Ambient Intelligence, Big Data, Internet of Things (IoT), and Machine Learning.

Valderi Reis Quietinho Leithardt, Instituto Superior de Engenharia de Lisboa, Portugal

Valderi R. Q. Leithardt (Senior Member, IEEE) received a Ph.D. degree in computer science from INF-UFRGS, Brazil, in 2015. He is currently a Professor with the Lisbon School of Engineering (ISEL), Polytechnic University of Lisbon (IPL), Portugal. His main research interests include distributed systems, focusing on data privacy, communication, and programming protocols, involving scenarios and applications for the Internet of Things, smart cities, big data, cloud computing, and blockchain.

References

A. Goel, C. Prabha, P. Sharma, N. Mittal, and V. Mittal, “Emerging research trends in data deduplication: A bibliometric analysis from 2010 to 2023,” Archives of Computational Methods in Engineering, pp. 1–18, 2024. DOI: 10.1007/s11831-024-10074-x.

P. Ramya and C. Sundar, “Secdedoop: Secure deduplication with access control of big data in the hdfs/hadoop environment,” Big Data, vol. 8, no. 2, pp. 147–163, 2020. DOI: 10.1089/big.2019.0120.

M. Basso, A. Rosà, L. Omini, and W. Binder, “Java vector api: Benchmarking and performance analysis,” in Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, pp. 1–12, 2023. DOI: 10.1145/3578360.3580265.

C.-Y. Su, A. Bansal, V. Jain, S. Ghanavati, and C. McMillan, “A language model of java methods with train/test deduplication,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 2152–2156, 2023. DOI: 10.1145/3611643.3613090.

X. Yang, R. Lu, J. Shao, X. Tang, and A. A. Ghorbani, “Achieving efficient secure deduplication with user-defined access control in cloud,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 1, pp. 591–606, 2020. DOI: 10.1109/TDSC.2020.2987793.

P. M. A. Kumar, E. Pugazhendhi, and R. K. Nayak, “Cloud storage performance improvement using deduplication and compression techniques,” in Proceedings of the 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 443–449, 2022. DOI: 10.1109/ICSSIT53264.2022.9716524.

S. Xu, D. Bremner, and D. Heidinga, “Mhdes: Deduplicating method handle graphs for efficient dynamic jvm language implementations,” in Proceedings of the 11th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems, pp. 1–10, 2016. DOI: 10.1145/3012408.3012412.

C. Soto-Valero, T. Durieux, N. Harrand, and B. Baudry, “Coverage-based debloating for java bytecode,” ACM Transactions on Software Engineering and Methodology, vol. 32, no. 2, pp. 1–34, 2023. DOI: 10.1145/3546948.

M. Horie, K. Ogata, K. Kawachiya, and T. Onodera, “String deduplication for java-based middleware in virtualized environments,” ACM SIGPLAN Notices, vol. 49, no. 7, pp. 177–188, 2014. DOI: 10.1145/2674025.2576210.

K. Nasartschuk, M. Dombrowski, T. Basa, M. Rahman, K. Kent, and G. Dueck, “Garcosim: A framework for automated memory management research and evaluation,” EAI Endorsed Transactions on Scalable Information Systems, vol. 3, no. 9, pp. e4–e4, 2016. DOI: 10.4108/eai.14-12-2015.2262678.

Y. Deng, X. Huang, L. Song, Y. Zhou, and F. Z. Wang, “Memory deduplication: An effective approach to improve the memory system,” Journal of Information Science and Engineering, vol. 33, no. 5, pp. 1103–1120, 2017. DOI: 10.6688/JISE.2017.33.5.1.

M. Thelwall, “A web crawler design for data mining,” Journal of Information Science, vol. 27, no. 5, pp. 319–325, 2001. DOI: 10.1177/016555150102700503.

Y. Gao and et al., “Reinforcement learning based web crawler detection for diversity and dynamics,” Neurocomputing, vol. 520, pp. 115–128, 2023. DOI: 10.1016/j.neucom.2022.11.059.

M. Hirzel and R. Grimm, “Jeannie: Granting java native interface developers their wishes,” ACM Sigplan Notices, vol. 42, no. 10, pp. 19–38, 2007. DOI: 10.1145/1297105.1297030.

M. Grichi, M. Abidi, F. Jaafar, E. E. Eghan, and B. Adams, “On the impact of interlanguage dependencies in multilanguage systems empirical case study on java native interface applications (jni),” IEEE Transactions on Reliability, vol. 70, no. 1, pp. 428–440, 2020. DOI: 10.1109/TR.2020.3024873.

H. Shin, D. Koo, and J. Hur, “Secure and efficient hybrid data deduplication in edge computing,” ACM Transactions on Internet Technology (TOIT), vol. 22, no. 3, pp. 1–25, 2022. DOI: 10.1145/3537675.

M. Watkinson and A. E. Brownlee, “Updating gin’s profiler for current java,” in 2023 IEEE/ACM International Workshop on Genetic Improvement (GI), pp. 23–28, 2023. DOI: 10.1109/GI59320.2023.00015.

R. Smith and S. Rixner, “Leveraging managed runtime systems to build, analyze, and optimize memory graphs,” ACM SIGPLAN Notices, vol. 51, no. 7, pp. 131–143, 2016. DOI: 10.1145/3007611.2892253.

D. Noetzold, “String deduplication validation.” https://github.com/DarlanNoetzold/StringDeduplicationValidation. Accessed on: August 01, 2024.

Published

2024-12-16

How to Cite

Noetzold, D., Rossetto, A. G. de M., Barbosa, J., & Leithardt, V. R. Q. (2024). Investigation and Optimization of StringDeduplication with Custom Heuristic in Different Versions of the JVM. IEEE Latin America Transactions, 23(1), 43–49. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/9188