Total views : 121

PBCCUT- Priority based Class Clustered under Sampling Technique Approaches for Imbalanced Data Classification

Affiliations

  • Dr. C.S.N Degree & P.G College, Bhimavaram – 534210, Andhra Pradesh, India
  • A.U Research Center & External Affairs, Department of Information Technology, SRKR Engineering College, Bhimavaram – 534204, Andhra Pradesh, India

Abstract


Objective: Data Mining is one of the majority inspiring areas of research to be develop into more and more accepted in health care organization. Advance structures of classifiers from imbalanced datasets are described. Class imbalance is a vital difficulty in machine learning and occurs in many domains most medical datasets are not balanced in their class labels. Usual classifiers do not carry out well when allowing for data at risk to both within-class and between-class imbalances. Methodology: Most obtainable classification methods tend not to do well on minority class examples when the dataset is very imbalanced. His research paper proposes the result of the accurateness of the result by using the Priority Based Class Clustered under sampling Technique approaches for imbalanced data classification. Findings: In attendance variations of the Adaptive K-means cluster analysis such that the imbalanced nature of the problem is openly addressed in the new algorithm formulation. Improvements: The present paper proposes a cluster-based priority under-sampling approach to select the representative data as training data to get better categorization and correctness for minority class to examine the result of under-sampling methods in the imbalanced class distribution environment.

Keywords

Class, wClustering, Data Mining, Imbalanced Data, PBCCUT

Full Text:

 |  (PDF views: 100)

References


  • Jo T, Japkowicz N. The class imbalance versus small disjuncts.ACM SIGKDD Exploration Newsletter. 2004 Jun;6(1):40–9. Crossref
  • Sun Y, Kamel MS, Wong AKC, Wang Y. Cost-Sensitive Boosting for Classification of Imbalanced Data. J. Pattern Recognition. 2007 Dec; 40(12):3358–78. Crossref
  • Japkowicz N, Stephen S. The class imbalance problem.A systematic study. J. Intelligent Data Analysis. 2002; 6(5):429–50.
  • Liu XY, Wu J, Zhou ZH. Exploratory Under Sampling for Class Imbalance Learning. Proc. Int’l Conf. Data Mining.
  • Apr; 39(2):539–50.
  • Zhang J, Mani I. KNN Approach to Unbalanced Data Distributions. A Case Study Involving Information Extraction. Proc. Int’l Conf. Machine Learning. Workshop Learning from Imbalanced Data Sets. 2003. p. 1–7.
  • Ding Z. Diversified ensemble classifiers for highly imbalanced data learning and its application in bioinformatics [Phd thesis]. Georgia State University; 2011. p. 1–149.
  • Yen SJ. Cluster-based under-sampling approaches for imbalanced data distributions, Expert Systems with Applications.An International Journal. 2009 April; 36(3):5718–27.
  • Palaniappan S, Awang R. Intelligent Heart Disease Prediction System Using Data Mining Techniques. IEEE / ACS International Conference on Computer Systems and Applications, AICCSA. 2008 Apr. p. 108–15.Crossref
  • Breiman L, Friedman J, Olshen R, Stone C. Classification and Regression Trees. BocaRaton: FL, CRC Press; 1984. p.1–10.
  • Shantakumar, Patil B, Kumaraswamy YS. Intelligent and Effective Heart attack Prediction System Using Data Mining and Artificial Neural Network. Eurp Journals Publishing Inc. 2009; 31(4):642–56.
  • Babu RB, Hussain MA, Babu RB. Comparative Study of Algorithms on Class Imbalanced Datasets. Indian Journal of Science and Technology. 2016 May; 9(18):1–6.Crossref
  • Debashish D, Safa SA, Noraziah A. An Efficient Time Series Analysis for Pharmaceutical Sector Stock Prediction by Applying Hybridization of Data Mining and Neural Network Technique. Indian Journal of Science and Technology. 2016 Jun; 9(21):1–7. Crossref
  • Batista G, Prati RC, Monard MC. A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explor. 2004; 6(1):20–9. Crossref
  • Patel H, Patel D. A Comparative Study on Various Data Mining Algorithms with Special Reference to Crop Yield Prediction. Indian Journal of Science and Technology. 2016 Jun; 9(22):1–8. Crossref
  • Lee SJ, Han TJ, Choi K. An Analysis of Radiation Fusion Technology-Related Patents Using Statistical Methods and Data Mining Techniques. Indian Journal of Science and Technology. 2016 Jul; 9(25):1–8. Crossref
  • Stel VS, Pluijm SM, Deeg DJ, Smit JH, Bouter LM, Lips P.A classification tree for predicting recurrent falling in communitydwelling older persons. J. Am. Geriatr. Soc. 2003; 51:1356–64. Crossref , PMid:14511154
  • Bache K, Lichman M. UCI Machine Learning Repository.Irvine, CA: University of California, School of Information and Computer Science; 2013.
  • Hand D, Mannila H, Smyth P. Principles of data mining.MIT. 2001; 18(3):17–25.
  • Fayyad U, Shapiro GP, Smyth P. The KDD process of extracting useful knowledge form volumes of data communication.ACM. 1996; 39(11):27–34. Crossref
  • Han J, Kamber M. Data mining, concepts and techniques.2nd ed. The Morgan Kaufmann Series; 2006.
  • Faramarz K, Ahad ZS, Gholamhosseyn D. Identifying the Effective Factors in the Profit and Loss of Vehicle Third Party Insurance for Insurance Companies via Data Mining Classification Algorithms. Indian Journal of Science and Technology. 2016 May; 9(18):1–8. Crossref
  • He H, Garcia E. Learning from imbalanced data. J. IEEE Transactions on Data and Knowledge Engineering. 2009; 9(21):1263–84.
  • Bellazzi R, Zupan B. Predictive data mining in clinical medicine: current issues and guidelines. Int. J. Med. Inform.2008; 77:81–97. Crossref , PMid:17188928
  • Kumari M, Godara S. Comparative Study of Data Mining Classification Methods in Cardiovascular Disease Prediction. 2011; 2(2):1–5 .
  • Rajeswari V, Arunesh K. Analysing Soil Data using Data Mining Classification Techniques. Indian Journal of Science and Technology. 2016 May; 9(19):1–4. Crossref 26. Soni J, Ansari U, Sharma D, Soni S. Predictive Data Mining for Medical Diagnosis. An Overview of Heart Disease Prediction. 2011; 5(1):1–6.
  • Kandwal R, Garg PK, Garg RD. Health GIS and HIV/AIDS studies: Perspective and retrospective. Journal of Biomedical Informatics.2009; 42:748–55. Crossref , PMid:19426832 w

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.