Evolving Fuzzy Set-based and Cloud-based Unsupervised Classifiers for Spam Detection
Keywords:
Unsupervised Classification, Spam Detection, Evolving Intelligent Systems, Clustering, Data StreamsAbstract
Advancements of the information and network technology have made individuals and organizations more dependent on electronic mails to communicate and share information. The increasing use of e-mails has led to increased production of unsolicited commercial or bulk messages, also known as spam. Spam classification systems able to self-adapt over time with no human intervention are rare. System adaptation is necessary because spams vary over time due to the application of several message-masking techniques. Moreover, classification models that handle large volumes of data using low computational resource are interesting in this context. Evolving Intelligent Systems are able to adapt their parameters and structure according to the data stream; therefore, they are able to cope with nonstationarities coming from data. This paper applies the evolving intelligent methods TEDA (Typicality and Eccentricity based Data Analytics) and FBeM (Fuzzy Set-Based Evolving Modeling) for online unsupervised classification of spams. TEDA and FBeM are compared in terms of classification accuracy, model compactness and processing time. For dimensionality reduction, a Spearman correlation-based feature ranking and selection non-parametric algorithm is employed. A dataset containing 25,745 samples, being 7,830 spams and 17,915 legitimate e-mails, was proposed. 711 features extracted from an e-mail server originally describe each sample.