Total views : 247

An Integrated Algorithm for Dimension Reduction and Classification Applied to Microarray Data of Neuromuscular Dystrophies


  • School of Computer Science and Engineering, Lovely Professional University, Chaheru, Phagwara - 144411, Punjab, India
  • School of Computer Applications, Lovely Professional University, Chaheru, Phagwara - 144411, Punjab, India
  • School of Biosciences, Lovely Professional University, Chaheru, Phagwara - 144411, Punjab, India


Background/Objectives: Microarray technology allows the neuromuscular dystrophy to be predicted using gene expression patterns. Microarray gene expression data suffer from curse of high dimensionality i.e. tens of thousands of genes and few samples. So, it is necessitate reducing the dimension for accurate diagnosis. Methods/Statistical Analysis: Firstly, five-fold cross validation technique is applied to generate random results. Two feature selection techniques i.e. t-test and entropy are employed to select the genes. K-nearest neighbor and linear support vector machine are deployed for classification of diseased samples with the help of ranked genes. The performance of these integrated techniques is tested on the microarray dataset of neuromuscular dystrophies i.e. Juvenile Dermatomyositis (JDM) and Fascioscapulohumeral Muscular Dystrophy (FSHD). Findings: Effective disease specific genes are selected from thousand of genes. The value of various performance measures shows that the integration of entropy with k-nearest neighbor has outperformed on both datasets. It has given 89.47% accuracy on JDM dataset and 100% accuracy on FSHD dataset. The integration of these methods is first time application on these two diseases datasets. It can be applied on other neuromuscular disorder datasets as well.


Dimension Reduction, Entropy, K-Fold Validation, K-Nearest Neighbor, Neuromuscular Dystrophy, Support Vector Machine.

Full Text:

 |  (PDF views: 248)


  • Babu MM. Introduction to microarray data analysis. Computational Genomics: Theory and Application. 2004; 17(6):225-49.
  • Sharma A, Paliwal KK. Cancer classification by gradient LDA technique using microarray gene expression data. Data and Knowledge Engineering. 2008 Aug; 66(2):338-47.
  • Chu F, Wang L. Applications of support vector machines to cancer classification with microarray data. International Journal of Neural Systems. 2005 Dec; 15(6):475-84.
  • Tang J, Alelyani S, Liu H. Exploiting social relations for sentiment analysis in microblogging. Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM ‘13); 2013. p. 537-46.
  • Yendrapalli K, Basnet RB, Mukkamala S, Sung AH. Gene selection for tumor classification using microarray gene expression data. World Congress on Engineering. 2007 Jul; p. 290-5.
  • Wang S, Wang Y, Du W, Sun F, Wang X, Zhou C, Liang Y. A multi-approaches-guided genetic algorithm with application to operon prediction. Artificial Intelligence in Medicine. 2007 Oct; 41(2):151-9.
  • Zheng CH, Chong YW, Wang HQ. Gene selection using independent variable group analysis for tumor classification. Neural Computing and Applications. 2011 Mar; 20(2):161-70.
  • Chen YC, Ke WC, Chiu HW. Risk classification of cancer survival using ANN with gene expression data from multiple laboratories. Computers in Biology and Medicine. 2014 May; 48:1-7.
  • Ziaei L, Mehri AR, Salehi MA. Application of artificial neural networks in cancer classification and diagnosis prediction of a subtype of lymphoma based on gene expression profile. Journal of Research in Medical Sciences. 2006 Jan; 11(1):13-7.
  • Chen AH, Lin CH. A novel support vector sampling technique to improve classification accuracy and to identify key genes of leukaemia and prostate cancers. Expert Systems with Applications. 2011 Apr; 38(4):3209-19.
  • Zhang JG, Deng HW. Gene selection for classification of microarray data based on the Bayes error. BMC Bioinformatics. 2007 Oct; 8(1):370.
  • Sahu SS, Panda G, Barik RC. A hybrid method of feature extraction for tumor classification using microarray gene expression data. Int J Comput Sci Informatics India. 2011; 1(1):1-5.
  • Lee ZJ. An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer. Artificial Intelligence in Medicine. 2008 Jan; 42(1):81-93.
  • Ananda Kumar K, Punithavalli DM. Efficient cancer classification using Fast Adaptive Neuro-Fuzzy Inference System (FANFIS) based on statistical techniques. IJACSA) International Journal of Advanced Computer Science and Applications, Special Issue on Artificial Intelligence. 2011; 132-7.
  • Das K, Ray J, Mishra D. Gene selection using information theory and statistical approach. Indian Journal of Science and Technology. 2015 Apr; 8(8):695.
  • Suganya P, Sumathi CP. A novel metaheuristic data mining algorithm for the detection and classification of Parkinson disease. Indian Journal of Science and Technology. 2015 Jul; 8(14):1-1.
  • Qian H. Relative Entropy: Free Energy Associated with Equilibrium Fluctuations and Nonequilibrium Deviations. 2000. Available from:
  • Zhang ML, Zhou ZH. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition. 2007 Jul; 40(7):2038-48.
  • Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician. 1992; 46(3):175-85.
  • Bakay M, Wang Z, Melcon G, Schiltz L, Xuan J, Zhao P, Sartorelli V, Seo J, Pegoraro E, Angelini C, Shneiderman B. Nuclear envelope dystrophies show a transcriptional fingerprint suggesting disruption of Rb–MyoD pathways in muscle regeneration. Brain. 2006 Apr; 129(4):996-1013.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.