Total views : 225

Document Clustering using a New Similarity Measure based on Energy of a Bipartite Graph

Affiliations

  • Department of Mathematics, School of Advanced Sciences, VIT University, Chennai - 600127, Tamil Nadu, India

Abstract


Objectives: This paper aims at clustering documents using a new similarity measure based on energy of a bipartite graph. Methods/Statistical Analysis: We have made use of bipartite representation of documents and clustered them. The proposed algorithm has been illustrated for a small document set. The documents have been clustered using the new similarity measure based on energy of a bipartite graph introduced by us. Findings: Our proposed algorithm gives a better clustering quality comparing with the k means clustering algorithm. Application/Improvements: This proposed algorithm can be further extended and applied to cluster large document sets.

Keywords

Bipartite Graph, Cluster Quality, Document Clustering, Energy, Similarity Measure.

Full Text:

 |  (PDF views: 213)

References


  • Nagaraj R, Thiagarasu V. Correlation similarity measure based document clustering with directed ridge regression. Indian Journal of Science and Technology. 2014 Jan; 7(5):1-6.
  • Kriegel HP, Kroger P, Sander J, Zimek A. Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2011; 1(3):231–40.
  • Chakrabarti D. Tools for large graph miners [thesis]. Center for Automated Learning and Discovery, School of Computer Science, Carnegie Mellon University, CMU-CALD-05-107; 2005. p. 1-117.
  • Sharief BS, Kartheek E. Laplacian energy of an intuitionistic fuzzy graph. Indian Journal of Science and Technology. 2015; 8(33):1-7.
  • West DB. Introduction to Graph Theory. Prentice Hall; 2001. p. 244.
  • Jack H, Koolen K. Maximal energy graphs. Advances in Applied Mathematics. 2001; 26(1):47-52.
  • Balakrishnan R. The energy of a graph. Linear Algebra and its Applications. 2004; 387:287-95.
  • Grace GH, Desikan K. Reduced term set based document clustering using bipartite graph representation. Proceedings (eBook) of the International Workshop on Graph Algorithms (IWGA2015); USM, Penang, Malaysia. 20015. p. 269-74.
  • Koolen JH, Moulton MV. Maximal energy bipartite graphs. Graphs Combinatorics. 2003; 19(1):131–5.
  • Pritam C, Gaigole G, Patil LH, Chaudhari PM. Pre-processing techniques in text catagorization. National Conference on Innovative Paradigms in Engineering and Technology (NVIPET-2013); 2013; p. 137-142.
  • Anil A, Kumar S, Chandrasekar C. Text Data preprocessing and dimensionality reduction techniques for document clustering. International Journal of Engineering Research and Technology. 2012; 1(5):1-6.
  • Rama Subramanian C, Ramya R. Effective pre-processing activities in text mining using improved porters stemming algorithm. International Journal of Advanced Research in Computer and Communication Engineering. 2013; 2(12):4536-8.
  • Murilo C, Naldi N, Richardo AFJGB, Campello C. Comparison among methods for k estimation in k-means. IEEE 9th International Conference on Intelligent Systems Design and Application; Brazil. 2009.
  • Rao AS, Ramakrishna S, Babu PC. MODC: Multi-objective distance based optimal document clustering by GA. Indian Journal of Science and Technology. 2016 Jul; 9(28):1-8.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.