Total views : 216

Significant Subgraph Mining with Representative Set

Affiliations

  • PVPSIT, Vijayawada - 520007, Andhra Pradesh, India
  • Department of Computer Science and Engineering, JNTUH College of Engineering, JNTU Hyderabad - 500085, Telangana, India
  • Department of Computer Science and Engineering, JNTU Kakinada, Kakinada – 533003, Andhra Pradesh, India

Abstract


Objectives: To mine significant subgraphs with user specified objective functions from a set of graphs that are useful for understanding the intrinsic characteristics of data in a scalable approach. Methods/Statistical Analysis: A large number of candidate subgraphs generated during mining process causes both computational and statistical problem. In this paper, Significant SubGraph Mining-SSGM proposes an algorithm to find significant subgraphs by using a small set of representative patterns - coreset that overcomes these problems. Furthermore, an edge graph notation is used to represent a graph that enables to mine patterns directly without using separate mining algorithm. Findings: The number of possible candidates is generally exponential in search space and techniques employed are mostly focussed on monotonic property. The proposed algorithm offers simple, yet efficient optimizations to significantly improve performance by pruning the search space and exploring representative graphs. It avoids enumeration of all frequent subgraphs which cause redundancy and extreme mining time. The identified coreset elements are extended that provide optimal solution patterns and adopted edge graph notation mine subgraphs directly. Application/Improvements: Experimental results shows that the proposed algorithm is effective and efficient for mining significant subgraphs in terms of computational cost, scalability and time over existing methods. The algorithm can be applied to find different types of significant patterns in a scalable manner by using any objective function according to the problem domain including support, correlation measure and feature set.

Keywords

Frequent Graphs, Objective Function, Representative Set, Statistical Significance, Subgraph Mining.

Full Text:

 |  (PDF views: 162)

References


  • Cook D, Holder L. Mining Graph Data. John Wiley and Sons Inc; 2007.
  • Ulaganathan PP, Thirusangu K, Selvam B. Super edge-magic total labelling in extended duplicate graph of path. Indian Journal of Science and Technology. 2011 May; 4(5). DOI: 10.17485/ijst/2011/v4i5/30069.
  • Ulaganathan PP, Thirusangu K, Selvam B. Graceful and Skolem graceful labelling in extended duplicate graph of path. Indian Journal of Science and Technology. 2011 Feb; 4(2). DOI: 10.17485/ijst/2011/v4i2/29943.
  • Kumar R, Raghavan P, Rajagopalan S, Sivakumar D. Tomkins A, Upfal E. The Web as a Graph. ACM PODS Conference; NY. 2000. p. 1–10.
  • Maio D, Maltoni D. A structural approach to fingerprint classification. Proceedings of 13th International Conference on Pattern Recognition; Vienna, Austria. 1996. p. 578–85.
  • Eichinger F, Bohm K, Huber M. Improved software fault detection - with Graph Mining. Workshop on Mining and Learning with Graphs; 2008. p. 1–3.
  • Gupta MS, Pathak A, Chakrabarti S. Fast algorithms for top-k personalized pagerank queries. WWW Conference; Beijing, China. 2008 Apr. p. 1225–6.
  • Koyuturk M, Grama A, Szpankowski W. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics. 2014; 20(1):200–7.
  • Zhou B, Pei J. Preserving privacy in social networks against neighborhood attacks. ICDE Conference; Cancun. 2008. p. 506–15.
  • Das K, Ray J, Mishra D. Gene selection using information theory and statistical approach. Indian Journal of Science and Technology. 2015 May; 8(S9). DOI: 10.17485/ijst/2015/v8iS9/51494.
  • Ramya N. On colourings of wheel graph (Wn). Indian Journal of Science and Technology. 2014 Mar; 7(S3). DOI:10.17485/ijst/2014/v7i3S/50405.
  • Bordino I, Donato D, Gionis A, Leonardi S. Mining large networks with subgraph counting. IEEE ICDM Conference; Pisa. 2008 Dec. p. 737–42.
  • Inokuchi A, Washio T, Motoda H. An apriori-based algorithm for mining frequent substructures from graph data. Principles of Data Mining and Knowledge Discovery; UK. 2000. p. 13–23.
  • Nijssen S, Kok JN. The Gaston tool for frequent subgraph mining. Proceedings of the International Workshop on Graph-Based Tools, Grabats, Electronic Notes In Theoritical Computer Science. Rome, Italy. 2004 Oct; 127(1):77–87.
  • Kuramochi M, Karypis G. An efficient algorithm for discovering frequent subgraphs. IEEE Transaction on Knowledge Data Engineering. 2004; 16(9):1038–105.
  • Yan X, Han J. gSpan: Graph-based substructure pattern mining. Proceedings of 2002 IEEE International Conference on Data Mining; 2002 Sept. p. 1–26.
  • Huan J, Wang W, Prims J. Efficient mining of frequent subgraph in the presence of isomorphism. University of North Carolina Computer Science Technical Report; 2003. p. 1–12.
  • Huan J, Wang W, Prins J, Yang J. SPIN: Mining maximal frequent subgraphs from graph databases. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; NY. 2004. p. 581–6.
  • Vedanayaki M. Graph mining techniques, tools and issues - A study. Indian Journal of Science and Technology. 2014 Nov; 7(S7). DOI: 10.17485/ijst/2014/v7iS7/61962.
  • Kudo T, Maeda E, Matsumoto Y. An application of boosting to graph classification. Advances in Neural Information Processing Systems 18 (NIPS’04); 2004. p. 1–8.
  • Ranu S, Singh AK. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. Proceedings of IEEE International Conference on Data Engineering; Shanghai. 2009 Apr. p. 844–55.
  • Yan X, Cheng H, Han J, Yu PS. Mining significant graph patterns by scalable leap search. Proceedings of SIGMOD; USA. 2008. p. 433–44.
  • Deshpande M, Kuramochi M, Wale N, Karypis G. Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering. 2005; 17(18):1036–50.
  • Kramer S, Raedt LD, Helma C. Molecular feature mining in HIV data. KDD ’01: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; USA. 2001. p. 136–43.
  • Yan X, Yu PS, Han J. Graph indexing: A frequent structure-based approach. Proceeding of SIGMOD; NY. 2004. p. 335–46.
  • Cheng H, Yan X, Han J, Hsu C. Discriminative frequent pattern analysis for effective classification. Proceeding of ICDE; Istanbul. 2007. p. 716–25.
  • Hasan M, Chaoji V, Salem S, Besson J, Zaki M. ORIGAMI: Mining representative orthogonal graph patterns. Proceeding of ICDM; NE. 2007. p. 153–62.
  • Tan P, Kumar V, Srivastava J. Selecting the right interestingness measure for association patterns. Proceeding of SIGKDD; 2002. p. 32–41.
  • Wale N, Ning X, Karypis G. Trends in chemical graph data mining. Managing and Mining Graph Data, Springer. 2010 Dec; 40:581–606.
  • Agarwal PK, Peled SH, Varadarajan K. Geometric approximation via corsets. International Journal of Engineering Goodman. Welzl, editor. Combinatorial and Computational Geometry, Cambridge University Press; 2005. p. 1–30.
  • Bunke H, Shearer K. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters. 1998; 19(3-4):255–9.
  • Chen JH, Linstead E, Swamidass SJ, Wang D, Baldi P. Chem DB - Update-full-text search and virtual chemical space. Bioinformatics. 2007; 23(17):2348–51.
  • Available from: http://pubchem.ncbi.nlm.nih.gov

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.