Total views : 226

Optimal Feature for Text Similarity based Hybrid Clustering Technique with Aid of MGWO


  • JNTUH, Hyderabad - 500085, Telangana, India


Objectives: The way toward gathering high dimensional information into groups is not exact and maybe not up to the level of desire when the dimensions of the dataset is high. It is presently centering gigantic consideration towards innovative work. Methods/Analysis: Initially the input high dimensional data is fed to similarity measure for text processing for feature selection, in which similarity between the categorical data is evaluated. Then we have planned to utilize optimal feature selection method. Feature determination is a vital subject in data mining, particularly for high dimensional datasets. In our proposed technique, Modified Grey Wolf Optimization technique is used for optimal feature selection. Next the selected features are grouped with the help of clustering technique. Here we are hybrid two clustering techniques for grouping the optimal features. Findings: The performance of the proposed technique is evaluated by means of clustering accuracy, Jaccard coefficient and Dice's coefficient. The proposed technique is compared with existing clustering algorithms. Novelty/Improvements: The primary intension of this research is to achieving promising results in text similarity based clustering technique. Here we are hybridizing k means and fuzzy c means clustering algorithm for grouping the optimal features.


Fuzzy C Means Clustering, Grey Wolf Optimization, Jaccard Coefficient and Dice's Coefficient, K Means, Similarity Measure for Text Processing.

Full Text:

 |  (PDF views: 191)


  • Thomas AM, Resmipriya MG. An efficient text classification scheme using clustering. International Conference on Emerging Trends in Engineering, Science and Technology. 2016; 24:1220-5.
  • Deshpande AR, Lobo LM. Text summarization using clustering technique. International Journal of Engineering Trends and Technology. 2013; 4:3348-51.
  • Puri S, Kaushik S. A technical study and analysis on fuzzy similarity based models for text classification. International Journal of Data Mining and Knowledge Management Process. 2012; 2:1-15. V2N2/2212ijdkp01.pdf
  • Patil PI, Singh G. A comprehensive survey of the existing text clustering techniques. International Journal of Scientific Development and Research. 2016; 1:291-3.
  • Patil D, Dongre Y. A clustering technique for email content mining. IJCSIT. 2015; 7:73-81.
  • Warad VC, Baron Sam B. Incremental MVS based clustering method for similarity measurement. International Journal of Computer Science and Information Technologies. 2014; 5:1486-91.
  • Gomaa WH, Fahmy AA. A survey of text similarity approaches. International Journal of Computer Applications. 2013; 68:13-8.
  • Rezaei M, Franti P. Matching Similarity for Keyword-based Clustering. 2011. p. 1- 10. PMid:22038677
  • Chaudhari PJ, Dharmadhikari DD. Clustering with multi-viewpoint based similarity measure: An overview. International Journal of Engineering Inventions. 2012; 1:15.
  • De Franca FO. A hash-based co-clustering algorithm for categorical data. Elsevier on Expert Systems with Applications. 2016; 64:24-35.
  • Irani J, Pise N, Phatak M. Clustering techniques and the similarity measures used in clustering: A survey. International Journal of Computer Applications. 2016; 134:9-14. https://
  • Yun U, Ryang H, Kwon OC. Monitoring vehicle outliers based on clustering technique. Applied Soft Computing Journal. 2016; 1-41.
  • Saranya J, Arunpriya C. Survey on clustering algorithms for sentence level text. International Journal of Computer Trends and Technology. 2014; 10:61-6.
  • Zamora J, Mendoza M, Allende H. Hashing-based clustering in high dimensional data. Expert Systems with Applications. 2016; 62:202-11.
  • Deepthi AL, Prasad J. Hierarchal clustering and similarity measures along with multi representation. International Journal of Research in Engineering and Technology. 2013; 2:76-9.
  • Vieira AS, Borrajo L, Iglesias EL. Improving the text classification using clustering and a novel HMM to reduce the dimensionality. Computer Methods and Programs in Biomedicine. 2016; 136:1-22.
  • Singh K, Shakya HK, Biswas B. Clustering of people in social network based on textual similarity. Perspectives in Science. 2016; 8:1-5.
  • Ordo-ez A, Ordo-ez H, Corrales JC, Cobos C, Wives LK, Thom LH. Grouping of business processes models based on an incremental clustering algorithm using fuzzy similarity and multimodal Search. Expert Systems with Applications. 2016; 67:1-21.
  • Ailem M, Role F, Nadif M. Graph modularity maximization as an effective method for co-clustering text data. Elsevier on Knowledge-based Systems. 2016; 109:1-48.
  • Wang P, Xu B, Xu J, Tian G, Liu CL, Hao H. Semantic expansion using word embedding clustering and convolution neural network for improving short text classification. Neurocomputing. 2016; 174:806-14. https://doi. org/10.1016/j.neucom.2015.09.096
  • Wu H, Zou B, Zhao YQ, Chen Z, Zhu C, Guo J. Natural scene text detection by multi-scale adaptive color clustering and non-text filtering. Neuro Computing. 2016; 1-15.
  • Arunraja M, Malathi V, Sakthivel E. Distributed similarity based clustering and compressed forwarding for wireless sensor networks. ISA Transactions. 2015; 59:180-92. PMid:26343165
  • Narayana SG, Vasumathi D. Text Similarity based Clustering Technique for High Dimensional Categorical Data. 2016. p. 1-5.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.