A Hybrid Efficient Heuristic with Hopkins Statistic for the Automatic Clustering Problem


  • Gustavo Silva Semaan INF - UFF
  • Augusto Cesar Fadel Instituto de Computação - UFF
  • José André de Moura Brito ENCE - IBGE
  • Luiz Satoru Ochi IC - UFF


Heuristic, Metaheuristics, Iterated Local Search, Automatic Clustering Problem, Hopkins Statistics, Silhouette Index, density-based


Cluster Analysis is a multivariate method to handle real problems associated with several fields. This area combines several methods of unsupervised classification, which can be applied in order to identify groups in a data set. The Clustering Problems are classified as NP-Hard and, in order to obtain such classification, the number of groups k may be fixed or, in the Automatic approach, the ideal k must be identified upon evaluation of some validation index. In this paper the Silhouette Index was considered and a new proposed Hybrid Heuristic Algorithm (HHA) operates to identify the ideal number of groups. The HHA consider two heuristic algorithms based on metaheuristics: an algorithm based on Iterated Local Search (ILS) that considers a density-based approach and a literature Evolutionary Algorithm (EA). Besides, the HHA have a heuristic algorithm that verify clustering tendency, considering the Hopkins Statistic. Basically, according with the clustering tendency level, the HHA use a specific heuristic (ILS or EA). The computational experiments used three literature data sets with eighty-two instances, and all of them were considered and reported by different researchers. The effectiveness and the efficiency of the proposed heuristic are reflected in substantially lower computational time and in the solutions quality, that are competitive when compared with the best results reported in the literature.


