Total views : 100
Hybrid Dimension Reduction Techniques with Genetic Algorithm and Neural Network for Classifying Leukemia Gene Expression Data
Background/Objectives: This paper presents a hybrid framework for classification of leukemia gene expression data. The framework used in this work consists of three subsystems, namely, class based dimension reduction subsystem, feature selection subsystem and classification subsystem. Methods/Statistical Analysis: This work uses class based dimension reduction techniques by employing PCA and Canonical Correlation Analysis (CCA) to the leukemia gene expression dataset. Acute Lymphoblastic Leukemia (ALL) class is subjected to Principal Component Analysis (PCA) and Acute Myeloid Leukemia (AML) class to CCA thus obtaining dimension reduced data. The feature selection subsystem uses Genetic Algorithm (GA) to select an optimal subset of informative genes. The classification subsystem utilizes these informative genes to train the NN and the classifier is obtained. Findings: The performance of the hybrid framework, GA-PCA and CCA, is analyzed and compared with that of single dimension reduction techniques, namely, GA-PCA and GA-CCA. The experimental results show that the proposed framework achieved accuracy of 88.23%. The sensitivity of the system is 85% and specificity of the system is 92.85%. This aids in determining the informative genes that are relevant to leukemia gene expression data. Applications/Improvements: The classification accuracy of GA-PCA and CCA has shown improvement when compared to that of single dimension reduction technique. Hence, combining more than one method yields higher classification accuracy and aids in identification of new classes.
Cancer Classification, Canonical Correlation Analysis (CCA), Dimensionality Reduction (DR), Genetic Algorithm (GA), Neural Network (NN), Principal Component Analysis (PCA)
- Huerta EB, Duval B, Hao J-K. A hybrid LDA and genetic algorithm for gene selection and classification of microarray data. Journal of Neurocomputing. 2010; 73(13):2375–83.
- Yao H, Tian L. A genetic-algorithm-based selective principal component analysis method for high-dimensional data feature extraction. IEEE Transactions on Geoscience and Remote Sensing. 2003; 41(6):1469–78.
- Liu Y. Wavelet feature extraction for high-dimensional microarray data. Neurocomputing. 2009; 72(4):985–90.
- Ghorai S, Mukherjee A, Sengupta S, Dutta PK. Cancer classification from gene expression data by NPPC ensemble. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2011; 8(3):659–71.
- Das K, Ray J, Mishra D. Gene selection using information theory and statistical approach. Indian Journal of Science and Technology. 2015; 8(8):695–701.
- Smith LI. A tutorial on principal components analysis.Cornell University, USA. 2002; 51:52.
- Abdi H, Williams LJ, Valentin D. Multiple factor analysis: principal component analysis for multitable and multiblock data sets. Wiley Interdisciplinary reviews: Computational statistics. 2013; 5(2):149–79.
- Sun L, Ji S, Ye J. Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011; 33(1):194–200.
- Sun L, Ji S, Yu S, Ye J. On the equivalence between canonical correlation analysis and orthonormalized partial least squares. Proceedings of International, Joint Conference of Artificial Intelligence. 2009; 9:1230–5.
- Manoharan GV, Shanmugalakshmi R. Multi-objective firefly algorithm for multi-class gene selection. Indian Journal of Science and Technology. 2015; 8(1):27–34.
- Yoon HJ, Wang BH, Lim, JS. Prediction of time series microarray data using neurofuzzy networks. Indian Journal of Science and Technology. 2015; 8(26):1–5.
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.