Un método de detección de valores atípicos basado en distancia para conjuntos de datos de alta dimensionalidad

Jesus Carlos Carmona Frausto; Ivan Lopez Arevalo; Josep Maria Mateo Sanz; Lauriano Jimenez Esteller; Edwyn Aldana Bobadilla

Authors

Jesus Carlos Carmona Frausto Cinvestav-Tamaulipas
Ivan Lopez Arevalo Cinvestav-Tamaulipas
Josep Maria Mateo Sanz Department d’ Enginyeria Qu´ımica, Universitat Rovira i Virgilli
Lauriano Jimenez Esteller Department d’ Enginyeria Qu´ımica, Universitat Rovira i Virgilli
Edwyn Aldana Bobadilla Conacyt-Cinvestav

Keywords:

outliers, distance based, deterministic method, high dimensions

Abstract

Tasks such as classification, clustering and regression require analyzing data. The results of this analysis are sensitive to the quality of data. In real life, there is a wide probability that the information contains errors and anomalies, which affect the results of such analysis. Due to this situation, it is necessary to perform known tasks such as preprocessing data that
ensure its quality for further analysis. One of the tasks included in this preprocessing is the detection of outliers. This paper proposes a method based on distance for detection of outliers on multivariate datasets. The method exploits the advantage of the low cost of processing distances, as well as the high sensitivity on outliers values of the mean.

Downloads

Download data is not yet available.

Author Biographies

Jesus Carlos Carmona Frausto, Cinvestav-Tamaulipas

Graduated in Computer Systems Engineering (2008) by the National Technological Institute of Mexico (Ciudad Victoria), Master's Degree in Computer Science from CINVESTAV Tamaulipas (2012). Currently a Doctorate student in Computer Science with interests in the area of data analysis, particularly in analysis of atypical values and sampling methods.

Ivan Lopez Arevalo, Cinvestav-Tamaulipas

He obtained his PhD in Computing from the Polytechnic University of Catalonia (Barcelona). He is currently associate professor Cinvestav Unidad Tamaulipas. His topics of interest cover various topics of Data Analysis in databases, the Web and social networks, such as Data Mining, Text Mining and Semantic Web. His work also includes Soft Computing topics in Engineering.

Josep Maria Mateo Sanz, Department d’ Enginyeria Qu´ımica, Universitat Rovira i Virgilli

He obtained the masters and doctorate degrees in mathematics at the University of Barcelona. He is currently an Associate Professor of Statistics in the Department of Chemical Engineering at the Rovira i Virgili University (Spain). His research interests are in statistics and operational research for process design.

Lauriano Jimenez Esteller, Department d’ Enginyeria Qu´ımica, Universitat Rovira i Virgilli

He obtained a Ph.D. degree in Chemical Engineering from the University of Barcelona. He is currently a professor at the Department of Chemical Engineering of the Rovira i Virgili University (Spain). His interests lie in the sustainable design of chemical processes

Edwyn Aldana Bobadilla, Conacyt-Cinvestav

Graduated in Computer Engineering (2003) from the District University (Bogota, Colombia), Masters and Doctorate in Computer Science from the National Autonomous University of Mexico (2009, 2015). Currently, he works as a Conacyt Researcher at CINVESTAV Tamaulipas. His research interests include machine learning, digital electronics, optimization, stochastic processes, software engineering and data analysis.

An outlier detection method based on distance for high dimensional datasets

Authors

Keywords:

Abstract

Downloads

Author Biographies

Jesus Carlos Carmona Frausto, Cinvestav-Tamaulipas

Ivan Lopez Arevalo, Cinvestav-Tamaulipas

Josep Maria Mateo Sanz, Department d’ Enginyeria Qu´ımica, Universitat Rovira i Virgilli

Lauriano Jimenez Esteller, Department d’ Enginyeria Qu´ımica, Universitat Rovira i Virgilli

Edwyn Aldana Bobadilla, Conacyt-Cinvestav

Downloads

Published

How to Cite

Issue

Section

Make a Submission

Information