Total views : 209

Detection of Projected Outliers from the Higher Dimensional data sets using Extended Kalman Filter and Fuzzy K-Means


  • IKGPTU, Jalandhar, Kapurthala - 144603, Punjab, India
  • RIMTIET (Affiliated to Punjab Technical University), Godindgarh - 147301, Punjab, India


Objectives: Curse of Dimensionality and the attribute relevance is the matter of great concern now these days while dealing with the higher dimensional data sets or Big Data, especially to detect the projected outliers. The objective of this research paper is to construct a Robust and a scalable model to prominently highlight the higher dimensional outliers in an effective and an efficient manner. Methods/Analysis: In order to detect the projected outliers, an algorithm EKFFK-Means with a hybrid approach is constructed using two important methodologies- Extended Kalman Filter (EKF) and Fuzzy K-Means. EKF is used to linearize the higher dimensional data by estimating the current mean and covariance by enhancing the Kalman gain and then fuzzy K-Means confirms the outlying property of each data instance and categorizes them in an effective and an efficient way using the membership label. Findings: A model EKFFK-Means is constructed that further creates 30 clusters from the complete data set to detect the projected outliers and various parameters like accuracy, cluster validity, True positive rate, False positive rate , robustness and cluster quality are calculated. Improvements: This algorithm is further compared with HPStream and CLUStream and is proved better against various parameters.


Clustering, Projected Outliers, Robustness, Scalability, Unsupervised.

Full Text:

 |  (PDF views: 233)


  • Ahmed SK, Naidu MM, Subha Rami Reddy C. Outliers/Most Influential Observations in Variable Returns to Scale Data Envelopment Analysis. Indian Journal of Science and Technology. 2016 Jan; 9(2). DOI: 10.17485/ijst/2016/v9i2/80361
  • Zhang T, Ramakrishnan R, Livny M. Birch: A new data Clustering algorithm and its Applications. Data Mining and Knowledge Discovery. 1997; 1141–82.
  • Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. SIGMOD Rec. 1998; 27(2):73–84.
  • Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. 2000; 427–38.
  • Zhang J. PhD Thesis by entitled as Towards Outlier Detection for higher Dimensional data streams using projected outlier analysis strategy. 2009.
  • Zvitia O, Mayer A. Co-registration of White matter Tractographies by Adaptive Mean shift and Gaussian Mixtures. 2001.
  • Liu B, Xiao Y, Yu PS, Hao Z. An Efficient Approach for outlier detection with imperfect data labels. IEEE Transactions on Knowledge and Data Engineering. 2014 Jul; 26(7).
  • Wu S, Wang S. Information- Theoretic Outlier Detection for Large Scale categorical data. IEEE Transaction of knowledge and Data Engineering. 2013 Mar; 25(3).
  • Andreou C, Karathanassi V. Estimation of the Number of End members using Robust Outlier Detection Method. IEEE journal of selected topics in Applied earth observations and Remote sensing. 2014 Jul; 7(1).
  • Aggarwal CC. A Human-Computer Interactive Method for Projected Clustering. IEEE Transactions on Knowledge and Data Engineering. 2004; 16(4):448–60.
  • Aggarwal CC, Procopiuc C, Wolf J, Yu PS, Park JS. Fast algorithms for projected clustering. ACM SIGMOD Conference, 1999
  • Available from: , 2016 Apr14.
  • Madhumita M. PhD thesis entitled as Study Of Kalman, Extended Kalman And Unscented Kalman Filter NIT Rourkela in 2010.
  • An Introduction to the Kalman Filter , Greg Welch and Gary Bishop, TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill-, NC-27599-3175, 24th July, 2006
  • Aggarwal CC, Han J, Wang J, Yu P. A Framework for Clustering Evolving Data Streams. VLDB Conference, 2003.
  • Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and Issues in Data Stream Systems, ACM PODS Conference, 2002.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.