Outlier Detection Method for Identifying Outliers that are not in Gaussian Distribution

Adikaram, K.K.L.B.; Hussein, M.A.; Effenberger, M.; Becker, T.

IRUOR Home
→
Scholarly Publications
→
Academic Sessions of University of Ruhuna
→
12th Academic Session - 2015
→
View Item

dc.contributor.author	Adikaram, K.K.L.B.
dc.contributor.author	Hussein, M.A.
dc.contributor.author	Effenberger, M.
dc.contributor.author	Becker, T.
dc.date.accessioned	2022-08-25T04:04:29Z
dc.date.available	2022-08-25T04:04:29Z
dc.date.issued	2015-03-04
dc.identifier.citation	Adikaram, K. K. L. B., Hussein, M. A., Effenberger, M. & Becker, T. (2015). Outlier Detection Method for Identifying Outliers that are not in Gaussian Distribution 12th Academic Sessions, University of Ruhuna, Matara, Sri Lanka, 86.
dc.identifier.issn	2362-0412
dc.identifier.uri	http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/7883
dc.description.abstract	The most statistical methods demand outlier (noise) in Gaussian distribution. When outliers are not in Gaussian distribution, these methods produce bias results. We introduce an outlier detection method that performs best when the outliers are in non Gaussian distribution. The method is non-parametric and based on properties of arithmetic progression (AP). If the number of elements in AP is n, the maximumelement is a max, the minimumelement is aOT,„, andthe sum of all elements is S„.ThenRmax = amax amm an(i/?min = amax ,a™fnis always equal to 2/n. Usually, /?„,ajc>2/«implies that S n ~ aminn a maxn $n — the maximumelementis an outlier and Rmin>2/mmp\ies that the minimum elementis an outlier. The value 2/n is nonparametric and always between 0 and 1. If t is a threshold relevant to the considered domain, the value 2/n + t can be used to identify significant outliers where 0 < t <1 -2/n. The method identifies one outlier at a time and continuous application of the method allows detection of multiple outliers. The algorithm was tested using several artificial and real data sets. The real data sets were the data which were automatically recorded with a frequency of twelve data points per day from a biogas plant, over a period of seven months. Among the different parameters, we selected the H2 content, which we expected to maintain linear behavior during the stable operation. When the outliers are non-Gaussian, the Grubbs’ test locates 0% - 17% as significant outliers at the significance level of 0.05,. With the new method, there was t, which was capable of locating more outliers than Grubbs’ test	en_US
dc.language.iso	en	en_US
dc.publisher	University of Ruhuna, Matara, Sri Lanka	en_US
dc.subject	Gaussian distribution	en_US
dc.subject	multiple outlier detection	en_US
dc.subject	non-parametric method	en_US
dc.subject	significant outliers	en_US
dc.title	Outlier Detection Method for Identifying Outliers that are not in Gaussian Distribution	en_US
dc.type	Article	en_US