An outlier detection method based on distance for high dimensional datasets
Keywords:
outliers, distance based, deterministic method, high dimensionsAbstract
Tasks such as classification, clustering and regression require analyzing data. The results of this analysis are sensitive to the quality of data. In real life, there is a wide probability that the information contains errors and anomalies, which affect the results of such analysis. Due to this situation, it is necessary to perform known tasks such as preprocessing data that
ensure its quality for further analysis. One of the tasks included in this preprocessing is the detection of outliers. This paper proposes a method based on distance for detection of outliers on multivariate datasets. The method exploits the advantage of the low cost of processing distances, as well as the high sensitivity on outliers values of the mean.