An outlier detection method based on distance for high dimensional datasets

Authors

  • Jesus Carlos Carmona Frausto Cinvestav-Tamaulipas
  • Ivan Lopez Arevalo Cinvestav-Tamaulipas
  • Josep Maria Mateo Sanz Department d’ Enginyeria Qu´ımica, Universitat Rovira i Virgilli
  • Lauriano Jimenez Esteller Department d’ Enginyeria Qu´ımica, Universitat Rovira i Virgilli
  • Edwyn Aldana Bobadilla Conacyt-Cinvestav

Keywords:

outliers, distance based, deterministic method, high dimensions

Abstract

Tasks such as classification, clustering and regression require analyzing data. The results of this analysis are sensitive to the quality of data. In real life, there is a wide probability that the information contains errors and anomalies, which affect the results of such analysis. Due to this situation, it is necessary to perform known tasks such as preprocessing data that
ensure its quality for further analysis. One of the tasks included in this preprocessing is the detection of outliers. This paper proposes a method based on distance for detection of outliers on multivariate datasets. The method exploits the advantage of the low cost of processing distances, as well as the high sensitivity on outliers values of the mean.

Downloads

Download data is not yet available.

Author Biographies

Jesus Carlos Carmona Frausto, Cinvestav-Tamaulipas

Graduated in Computer Systems Engineering (2008) by the National Technological Institute of Mexico (Ciudad Victoria), Master's Degree in Computer Science from CINVESTAV Tamaulipas (2012). Currently a Doctorate student in Computer Science with interests in the area of data analysis, particularly in analysis of atypical values and sampling methods.

Ivan Lopez Arevalo, Cinvestav-Tamaulipas

He obtained his PhD in Computing from the Polytechnic University of Catalonia (Barcelona). He is currently associate professor Cinvestav Unidad Tamaulipas. His topics of interest cover various topics of Data Analysis in databases, the Web and social networks, such as Data Mining, Text Mining and Semantic Web. His work also includes Soft Computing topics in Engineering.

Josep Maria Mateo Sanz, Department d’ Enginyeria Qu´ımica, Universitat Rovira i Virgilli

He obtained the masters and doctorate degrees in mathematics at the University of Barcelona. He is currently an Associate Professor of Statistics in the Department of Chemical Engineering at the Rovira i Virgili University (Spain). His research interests are in statistics and operational research for process design.

Lauriano Jimenez Esteller, Department d’ Enginyeria Qu´ımica, Universitat Rovira i Virgilli

He obtained a Ph.D. degree in Chemical Engineering from the University of Barcelona. He is currently a professor at the Department of Chemical Engineering of the Rovira i Virgili University (Spain). His interests lie in the sustainable design of chemical processes

Edwyn Aldana Bobadilla, Conacyt-Cinvestav

Graduated in Computer Engineering (2003) from the District University (Bogota, Colombia), Masters and Doctorate in Computer Science from the National Autonomous University of Mexico (2009, 2015). Currently, he works as a Conacyt Researcher at CINVESTAV Tamaulipas. His research interests include machine learning, digital electronics, optimization, stochastic processes, software engineering and data analysis.

Published

2020-04-12

How to Cite

Carmona Frausto, J. C., Lopez Arevalo, I., Mateo Sanz, J. M., Jimenez Esteller, L., & Aldana Bobadilla, E. (2020). An outlier detection method based on distance for high dimensional datasets. IEEE Latin America Transactions, 18(3), 589–597. Retrieved from https://latamt.ieeer9.org/index.php/transactions/article/view/273