Investigation and Optimization of StringDeduplication with Custom Heuristic in Different Versions of the JVM
Keywords:
StringDeduplication, JVM Performance, Memory Optimization, Heuristics, Native Code IntegrationAbstract
Memory optimization in Java applications is essential for performance and scalability. This paper investigates the efficiency of the StringDeduplication parameter in JVM versions 11, 17, and 21, using a Web Crawler developed in Spring Boot. The results show that the efficiency of StringDeduplication decreased from 34.3% deduplication in version 11 to 3.4% in version 21, with an increase in deduplication time from 1,264 ms to 3,439 ms. To mitigate this problem, a custom solution in C was developed for JVM version 21, which increased deduplication to 31.1% and saved 110.2 MB of memory. The main scientific contribution of this work is the identification of the loss of efficiency of StringDeduplication in the latest JVM versions and the proposal of a custom solution that improves string deduplication, offering a viable alternative for developers and software engineers.
Downloads
References
A. Goel, C. Prabha, P. Sharma, N. Mittal, and V. Mittal, “Emerging research trends in data deduplication: A bibliometric analysis from 2010 to 2023,” Archives of Computational Methods in Engineering, pp. 1–18, 2024. DOI: 10.1007/s11831-024-10074-x.
P. Ramya and C. Sundar, “Secdedoop: Secure deduplication with access control of big data in the hdfs/hadoop environment,” Big Data, vol. 8, no. 2, pp. 147–163, 2020. DOI: 10.1089/big.2019.0120.
M. Basso, A. Rosà, L. Omini, and W. Binder, “Java vector api: Benchmarking and performance analysis,” in Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, pp. 1–12, 2023. DOI: 10.1145/3578360.3580265.
C.-Y. Su, A. Bansal, V. Jain, S. Ghanavati, and C. McMillan, “A language model of java methods with train/test deduplication,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 2152–2156, 2023. DOI: 10.1145/3611643.3613090.
X. Yang, R. Lu, J. Shao, X. Tang, and A. A. Ghorbani, “Achieving efficient secure deduplication with user-defined access control in cloud,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 1, pp. 591–606, 2020. DOI: 10.1109/TDSC.2020.2987793.
P. M. A. Kumar, E. Pugazhendhi, and R. K. Nayak, “Cloud storage performance improvement using deduplication and compression techniques,” in Proceedings of the 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 443–449, 2022. DOI: 10.1109/ICSSIT53264.2022.9716524.
S. Xu, D. Bremner, and D. Heidinga, “Mhdes: Deduplicating method handle graphs for efficient dynamic jvm language implementations,” in Proceedings of the 11th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems, pp. 1–10, 2016. DOI: 10.1145/3012408.3012412.
C. Soto-Valero, T. Durieux, N. Harrand, and B. Baudry, “Coverage-based debloating for java bytecode,” ACM Transactions on Software Engineering and Methodology, vol. 32, no. 2, pp. 1–34, 2023. DOI: 10.1145/3546948.
M. Horie, K. Ogata, K. Kawachiya, and T. Onodera, “String deduplication for java-based middleware in virtualized environments,” ACM SIGPLAN Notices, vol. 49, no. 7, pp. 177–188, 2014. DOI: 10.1145/2674025.2576210.
K. Nasartschuk, M. Dombrowski, T. Basa, M. Rahman, K. Kent, and G. Dueck, “Garcosim: A framework for automated memory management research and evaluation,” EAI Endorsed Transactions on Scalable Information Systems, vol. 3, no. 9, pp. e4–e4, 2016. DOI: 10.4108/eai.14-12-2015.2262678.
Y. Deng, X. Huang, L. Song, Y. Zhou, and F. Z. Wang, “Memory deduplication: An effective approach to improve the memory system,” Journal of Information Science and Engineering, vol. 33, no. 5, pp. 1103–1120, 2017. DOI: 10.6688/JISE.2017.33.5.1.
M. Thelwall, “A web crawler design for data mining,” Journal of Information Science, vol. 27, no. 5, pp. 319–325, 2001. DOI: 10.1177/016555150102700503.
Y. Gao and et al., “Reinforcement learning based web crawler detection for diversity and dynamics,” Neurocomputing, vol. 520, pp. 115–128, 2023. DOI: 10.1016/j.neucom.2022.11.059.
M. Hirzel and R. Grimm, “Jeannie: Granting java native interface developers their wishes,” ACM Sigplan Notices, vol. 42, no. 10, pp. 19–38, 2007. DOI: 10.1145/1297105.1297030.
M. Grichi, M. Abidi, F. Jaafar, E. E. Eghan, and B. Adams, “On the impact of interlanguage dependencies in multilanguage systems empirical case study on java native interface applications (jni),” IEEE Transactions on Reliability, vol. 70, no. 1, pp. 428–440, 2020. DOI: 10.1109/TR.2020.3024873.
H. Shin, D. Koo, and J. Hur, “Secure and efficient hybrid data deduplication in edge computing,” ACM Transactions on Internet Technology (TOIT), vol. 22, no. 3, pp. 1–25, 2022. DOI: 10.1145/3537675.
M. Watkinson and A. E. Brownlee, “Updating gin’s profiler for current java,” in 2023 IEEE/ACM International Workshop on Genetic Improvement (GI), pp. 23–28, 2023. DOI: 10.1109/GI59320.2023.00015.
R. Smith and S. Rixner, “Leveraging managed runtime systems to build, analyze, and optimize memory graphs,” ACM SIGPLAN Notices, vol. 51, no. 7, pp. 131–143, 2016. DOI: 10.1145/3007611.2892253.
D. Noetzold, “String deduplication validation.” https://github.com/DarlanNoetzold/StringDeduplicationValidation. Accessed on: August 01, 2024.
![](https://latamt.ieeer9.org/public/journals/1/submission_9188_12747_coverImage_en_US.png)