An Effective VNS Algorithm for k-Medoids Clustering Problem
Keywords:k-medoids clustering, VNS, mathematical programming
This paper proposes an algorithm based on VNS metaheuristcs for k-medoids clustering, which is a NP-hard optimization problem. The VNS algorithm was applied in fifty data bases (instances) with small, medium, and large sizes, considering the number of clusters between 2 and 7. The obtained results from these experiments show the effectiveness of this approach, comparing it with nine other related clustering algorithms and an optimization formulation. Furthermore, we found that our algorithm obtained the optimal solutions for the vast majority of the cases.
D. Gunopulos, Clustering Overview and Applications. LIU L., ÖZSU M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, M, 2009.
J. Hair, W. Black, and R. Babin, B.J.and Anderson, Multivariate Data Analysis. Cengage, 8th edition, 2018.
J. Han, M. Kamber, and J. Pei, Data Mining – Concepts and Techniques. Elsevier, 3rd edition, 2012.
P. Hansen and B. Jaumard, “Cluster analysis and mathematical programming,”Mathematical Programming, vol. 79, pp. 191–215, 1997.
J. Fiorucci, F. Toledo, and M. Nascimento, “Heuristics for minimizing the maximum within-clusters distance,” Pesquisa Operacional, vol. 32, no. 3, pp. 497–522, 2012.
J. A. M. Brito, A. C. Fadel, G. S. Semaan, and L. R. Brito, “Algoritmo genético de chaves aleatória viciadas aplicado ao problema de agrupamento com soma mínima,” in Anais do XLIX Simpósio Brasileiro de Pesquisa Operacional, vol. 11, pp. 2325–2336, Sobrapo, 2017.
J. A. M. Brito, A. C. Fadel, G. S. Semaan, and F. M. T. Montenegro, “Heuristics applied to minimization of the maximum-diameter clustering problem,” IEEE Latin America Transactions, vol. 100, no. 1, 2021.
R. Johnson and D. Wichern, Applied Multivariate Statistical Analysis(Classic Version). Pearson, 6th edition, 2018.
S. Zhu, L. Xu, and E. Goodman, “Evolutionary multi-objective automatic clustering enhanced with quality metrics and ensemble strategy,” Knowledge-Based Systems, vol. 188(5), pp. 1–21, 2020.
G. Semaan, A. Fadel, J. Brito, and L. Ochi, “A hybrid efficient heuristic with hopkins statistic for the automatic clustering problem,” IEEE Latin America Transactions, vol. 17(1), pp. 7–17, 2019.
G. Semaan, Algoritmos para o Problema de Agrupamento Automático. PhD thesis, Federal Fluminense University, 2013.
A. J. O. Reyes, A. O. Garcia, and Y. L. Mue, “System for processing and analysis of information using clustering technique,” IEEE Latin America Transactions, vol. 12(2), pp. 364–371, 2014.
J. C. R. Thomas, M. S. Penãs, M. M. Cofre, and N. D. Carralero, “Performance analysis of clustering internal validation indexes with asymmetric clusters,” IEEE Latin America Transactions, vol. 17(5), pp. 807 – 814, 2019.
M. Nascimento, F. Toledo, and A. de Carvalho, “Investigation of a new grasp - based clustering algorithm applied to biological data,” Computers & Operations Research, vol. 37, pp. 1381–1388, 2010.
D. Xu and Y. Tian, “A comprehensive survey of clustering algorithm,” Annals of Data Science, vol. 2(2), pp. 165–193, 2015.
R. Oliveira, A. Chaves, and L. Lorena, “A comparison of two hybrid methods for constrained clustering problems,” Applied Soft Computing, vol. 54, pp. 256–266, 2017.
M. Rao, “Cluster analysis and mathematical programming,” Journal of American Statistical Association, vol. 1971, pp. 622–626, 1971.
N. Negreiros, M.J.and Maculan, P. Batista, J. Rodrigues, and A. Palhano, “Capacitated clustering problems applied to the layout of it-teams in software factories,” Annals of Operations Research - doi.org/10.1007/s10479-020-03785-4, vol. x, pp. xx–xx, 2020.
L. Kaufman and P. Rousseeuw, Finding Groups in Data – An Introduction to Cluster Analysis. Wiley-Interscience, 1st edition, 1990.
J. Han and R. Ng, “Clarans: A method for clustering objects for spatial datamining,” IEEE Transactions Knowledge of Data Enginnering, vol. 14(5), pp. 1003–1016, 2002.
N. Mladenovi´c and P. Hansen, “Variable neighborhood search,” Computers & Operations Research, vol. 24, no. 11, pp. 1097–1100, 1997.
P. Hansen, N. Mladenovi´c, and J. Perez, “Variable neighbourhood search: methods and applications,” Annals of Operations Research, vol. 175, pp. 367–407, 2010.
M. Gendreau and J.-Y. Potvin, Handbook of Metaheuristics. Springer, 3rd edition, 2019.
N. Megiddo and K. J. Supowit, “On the complexity of some common geometric location problems,” SIAM Journal on Computing, vol. 13(1), pp. 182–196, 1984.
H. Vinod, “Integer programming and theory of grouping,” Journal of American Statistical Association, vol. 64, pp. 506–517, 1969.
L. Wolsey, Integer Programming. Wiley, 2nd edition, 2020.
M. J. van der Laan, K. S. Pollard, and J. Bryan, “A new partitioning around medoids algorithm,” Journal of Statistical Computation and Simulation, vol. 73(8), pp. 575–584, 2003.
Q. Zhang and I. Couloigner, “A new and efficient k-medoid algorithm for spatial clustering,” Lecture Notes in Computer Science, vol. 342, pp. 181–189, 2005.
S. C. Chu, J. F. Roddick, and J. S. Pan, “Improved search strategies and extensions to k-medoids-based clustering algorithms,” International Journal of Business Intelligence and Data Mining, vol. 3(2), pp. 212– 231, 2009.
H. S. Park and C. H. Jun, “A simple and fast algorithm for k-medoids clustering,” Expert Systems with Applications, vol. 36, no. 2, pp. 3336– 3341, 2009.
M. Nascimento, F. Toledo, and A. de Carvalho, “Hybrid heuristc for the k-medoids clustering problem,” A Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference, vol. 1, pp. 417–424, 2012.
S. Zadegan, M. Mirzaie, and F. Sadoughi, “Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets,” Elsevier, vol. 39, pp. 133–143, 2013.
D. Yu, G. Liu, M. Guo, and X. Liu, “An improved k-medoids algorithm based on step increasing and optimizing,” Expert Systems with Applications, vol. 92, pp. 464–47, 2018.
H. Song andW. Lee, J.G.and Han, “Pamae: Parallel k-medoids clustering with high accuracy and efficiency,” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 1087–1096, 2017.
E. Schubert and P. J. Rousseeuw, “Faster k-medoids clustering: Improving the pam, clara, and clarans algorithms,” Similarity Search and Applications - Lecture Notes in Computer Science, vol. 11807, p. 171–187, 2019.
A. Ushakov and I. Vasilyev, “A parallel heuristic for a k-medoids clustering problem with unfixed number of clusters,” MIPRO - International Convention on Information and Communication Technology, Electronics and Microelectronics,Proceedings, pp. 1116–1120, 2019.
Y. W. Wang, A. Jian, Y. Liang, and H. Wang, “Optimization of kmedoids algorithm for initial clustering center,” in IOP Conf. Series: Journal of Physics:, vol. 1487, pp. 1–7, IOP Publishing, 2020.
A. Ushakov and I. Vasilyev, “Near-optimal large-scale k-medoids clustering,” Information Sciences, vol. 545, pp. 344–362, 2021.
C. Lucasius, A. Dane, and G. Kateman, “On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasibility and comparison,” Analytica Chimica Acta, vol. 282, pp. 647–669, 1993.
W. Sheng and X. Liu, “A hybrid algorithm of k-medoid clustering of large data sets,” Proceedings of the Congress on Evolutionary Computation – IEE, pp. 77–82, 2004.
W. Sheng and X. Liu, “A genetic k-medoids clustering algorithm,” Journal of Heuristics, vol. 12, pp. 447–46, 2006.
J. A. M. Brito, G. S. Semaan, and L. R. Brito, “Resolução do problema dos k-medoids via algoritmo genético de chaves aleatórias viciadas,” Revista Pesquisa Naval, vol. 27, pp. 126–142, 2015.
A. Hudaib, M. Khanafseh, and O. Surakhi, “An improved version of k-medoid algorithm using cro,” Modern Applied Science, vol. 12, no. 2, pp. 116–12, 2018.
R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2020.