Total views : 881

Improving Triangle-Graph Based Text Summarization using Hybrid Similarity Function

Affiliations

  • Faculty of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia

Abstract


Objective: Extractive Summarization, extracts the most applicable sentences from the main document, while keeping the most vital information in the document. The Graph-based techniques have become very popular for text summarisation. This paper introduces a hybrid graph based technique for single-document extractive summarization. Methods/Statistical Analysis: Prior research that utilised the graph-based approach for extractive summarisation deployed one function for computing the necessary summary. Nonetheless, in our work, we have recommended an innovative hybrid similarity function (H), for estimation purpose. This function hybridises four distinct similarity measures: cosine similarity (sim1), Jaccard similarity (sim2), word alignmentbased similarity (sim3) and the window-based similarity measure (sim4). The method uses a trainable summarizer, which takes into account several features. The effect of these features on the summarization task is investigated. Findings: By combining, the traditional similarity measures (Cosine and Jaccard) with dynamic programming approaches (word alignment-based and the window-based) for calculating the similarity between two sentences, more common information were extracted and helped to find the best sentences to be extracted in the final summary. The proposed method was evaluated using ROUGE measures on the dataset DUC2002. The experimental results showed that specific combinations of features could give higher efficiency. It also showed that some features have more effect than others on the summary creation. Applications/Improvements: The performance of this new method has been tested using the DUC 2002 data set. The effectiveness of this technique is measured using the ROUGE score, and the results are promising when compared with some existing techniques.

Keywords

Extractive Summarization, Feature Extraction, Graph-Based Summarization, Hybrid Similarity, Sentence Similarity, Triangle Counting

Full Text:

 |  (PDF views: 241)

References


  • David FS. USA: Wiley publishing, Inc: Model driven architecture: applying MDA to enterprise computing. 2003.
  • Mani I. John Benjamins Publishing : Automatic summarization.2001; 3.
  • Radev DR, Hovy E, McKeown K. Introduction to the special issue on summarization. Computational linguistics.2002; 28(4):399-408. Crossref
  • Lin CY, Hovy E. Identifying topics by position. The Proceedings of the fifth conference on Applied natural language processing. 1997; p. 283-90. Crossref
  • Mazdak N. Stockholm University: FarsiSum-a Persian text summarizer, Master thesis, Department of Linguistics, (PDF). 2004.
  • Langville AN, Meyer CD. Princeton University Press: Google’s PageRank and beyond: The science of search engine rankings. 2011.
  • Mani I, Maybury MT. MIT Press: Advances in automatic text summarization. 1999.
  • Wan X. Using only cross-document relationships for both generic and topic-focused multi-document summarizations.Information Retrieval. 2008; 11(1): 25-49. Crossref
  • Mihalcea R, Ceylan H. Explorations in Automatic Book Summarization. Paper presented at the EMNLP-CoNLL.2007; p. 380-89.
  • Yeh JY, Ke HR, Yang WP, Meng IH. Text summarization using a trainable summarizer and latent semantic analysis.Information processing & management. 2005; 41(1):75-95.Crossref
  • Alguliev R, Bagirov A. Global optimization in the summarization of text documents. Automatic Control andComputer Sciences. 2005; 39(6):42-47.
  • Alguliev RM, Aliguliyev RM. Effective summarization method of text documents. Paper presented at the The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). 2005; p. 264-71. Crossref
  • Aliguliyev RM. A novel partitioning-based clustering method and generic document summarization. Paper presented at the Proceedings of the 2006 IEEE/WIC/ ACM international conference on Web Intelligence and Intelligent Agent Technology. 2006; p. 626-29. Crossref
  • Aliguliyev RM. Automatic document summarization by sentence extraction. Вычислительные технологии. 2007; 12(5).
  • Erkan G, Radev DR. LexPageRank: Prestige in MultiDocument Text Summarization. Paper presented at the EMNLP. 2004; p. 365-71.
  • Erkan G, Radev DR. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research. 2004; 22: 457-79.
  • Radev DR, Jing H, Stys M, Tam D. Centroid-based summarization of multiple documents. Information Processing & Management. 2004; 40(6):919-38. Crossref
  • Dunlavy DM, O’Leary DP, Conroy JM, Schlesinger JD. QCS: A system for querying, clustering and summarizing documents. Information Processing & Management. 2007; 43(6):1588-605. https://doi.org/10.1016/j.ipm.2007.01.003
  • Fisher S, Roark B. Query-focused summarization by supervised sentence ranking and skewed word distributions.New York, USA: the Proceedings of the Document Understanding Conference. DUC-2006. 2006.
  • Li J, Sun L, Kit C, Webster J. The Proc. of Document Understanding Conference: A query-focused multi-document summarizer based on lexical chains. 2007.
  • Gong Y, Liu X. Generic text summarization using relevance measure and latent semantic analysis. The Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001, p. 19-25. Crossref
  • Jones KS. Automatic summarising: The state of the art.Information Processing & Management. 2007; 43(6):144981. Crossref
  • Salton G, Buckley C. Improving retrieval performance by relevance feedback. Readings in information retrieval.1997; 24(5):355-63.
  • Gupta V, Lehal GS. A survey of text summarization extractive techniques. Journal of emerging technologies in web intelligence. 2010; 2(3): 258-68. Crossref
  • Crovella ME, Bestavros A. IEEE/ACM Transactions on networking: Self-similarity in World Wide Web traffic: evidence and possible causes. 1997; 5(6):835-46.
  • Kupiec JM, Schuetze H. Google Patents: System for genrespecific summarization of documents. 2004.
  • Mihalcea R. Graph-based ranking algorithms for sentence extraction, applied to text summarization. The Proceedings of the ACL 2004 on Interactive poster and demonstration sessions. 2004; 20. Crossref
  • Patil K, Brazdil P. Text summarization: Using centrality in the pathfinder network. International Journal Computional Science Information System. 2007; 2:18-32.
  • Cover TM, Thomas JA. John Wiley & Sons: Elements of information theory. 2012.
  • McKee T, McMorris F. Topics in Intersection Graph Theory (SIAM Monographs on Discrete Mathematics and Applications). Society for Industrial and Applied Mathmatics. 1999. Crossref
  • Mihalcea R, Tarau P. TextRank: Bringing order into texts. 2004.
  • Sonawane S, Kulkarni P. Graph based Representation and Analysis of Text Document: A Survey of Techniques.International Journal of Computer Applications. 2014; 96(19):1. Crossref
  • Thakkar KS, Dharaskar RV, Chandak M. Graph-based algorithms for text summarization. 2010 3rd International Conference: Paper presented at the Emerging Trends in Engineering and Technology (ICETET). 2010; p. 516-19.
  • Ge SS, Zhang Z, He H. Weighted graph model based sentence clustering and ranking for document summarization.2011 4th International Conference on the Interaction Sciences (ICIS). 2011; p. 90-95.
  • Hoang TAN, Nguyen HK, Tran QV. An efficient vietnamese text summarization approach based on graph model. 2010 IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF). 2010; p. 1-6.
  • Nomoto T, Matsumoto Y. A new approach to unsupervised text summarization. The Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 2001; p. 26-34.Crossref
  • Wei Y. Document summarization method based on heterogeneous graph. 2012 9th International Conference onFuzzy Systems and Knowledge Discovery (FSKD). 2012; p.1285-89.
  • Baralis E, Cagliero L, Mahoto N, Fiori A. GRAPHSUM: Discovering correlations among multiple terms for graph-based summarization. Information Sciences. 2013; 249:96-109. Crossref
  • Ferreira R, Freitas F, Cabral SL, Lins RD, Lima R, França G. A four dimension graph model for automatic text summarization.2013 IEEE/WIC/ACM International JointConferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT). 2013; p. 389-96.
  • Ramesh A, Srinivasa K, Pramod N. SentenceRank - A graph based approach to summarize text. 2014 Fifth International Conference on Applications of Digital Information and Web Technologies (ICADIWT). 2014; p. 177-82. Crossref
  • Wei F, He Y, Li W, Lu Q. A Query-Sensitive Graph-Based Sentence Ranking Algorithm for Query-Oriented Multidocument Summarization. 2008 International Symposiums on Information Processing (ISIP). 2008; p. 9-13.
  • Schank T, Wagner D. Finding, counting and listing all triangles in large graphs, an experimental study. International Workshop on Experimental and Efficient Algorithms. 2005; p. 606-09. Crossref
  • Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. The Proceedings of the twenty-eighth annual ACM symposium on Theory of computing. 1996; p. 20-29. Crossref
  • Tsourakakis CE. Fast counting of triangles in large real networks without counting: Algorithms and laws. The 2008 Eighth IEEE International Conference on Data Mining.2008; p. 608-17.
  • Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM. 2008; 51(1):107-13. Crossref
  • Bordino I, Donato D, Gionis A, Leonardi S. Mining large networks with subgraph counting. Paper presented at the 2008 Eighth IEEE International Conference on Data Mining. 2008; p. 737-42. Crossref
  • Tsourakakis CE, Kang U, Miller GL, Faloutsos C. Doulion: counting triangles in massive graphs with a coin. The Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009; 837-46. Crossref
  • Avron H. Counting triangles in large graphs using randomized matrix trace estimation. The Workshop on Large-scale Data Mining: Theory and Applications. 2010; p. 1-10.
  • Tsourakakis CE. MACH: Fast Randomized Tensor Decompositions. Paper presented at the SDM. 2010; p. 689700.
  • Al-Khassawneh YAJ, Bakar AA, Zainudin S. Triangle counting approach for graph-based association rules mining. 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD). 2012; p. 661-65.
  • Schenker A, Kandel A, Bunke H, Last M. Graph-theoretic techniques for web content mining World Scientific. 2005; 62.
  • Qian G, Sural S, Gu Y, Pramanik S.Similarity between Euclidean and cosine angle distance for nearest neighbor queries. The Proceedings of the 2004 ACM symposium on Applied computing. 2004; p. 1232-37. Crossref
  • Hajeer I. Comparison on the Effectiveness of Different Statistical Similarity Measures. International Journal of Computer Applications. 2012; 53(8):1. Crossref
  • Manusnanth P, Arj-in S. Document clustering results on the semantic web search. The Proceedings of the 5th National Conference on Computing and Information Technology.2009.
  • Attwood TK, Parry-Smith DJ. Introduction to bioinformatics: Prentice Hall. 2003.
  • Higgs PG, Attwood TK. Bioinformatics and molecular evolution: John Wiley & Sons.2013.
  • Smith TF, Waterman MS. Identification of common molecular subsequences. Journal of molecular biology. 1981; 147(1):195-97. Crossref
  • Chavez A, Davila H, Gutierrez Y, Fernandez-Orquín A, Montoyo A, Mu-oz R. Umcc_dlsi_semsim: Multilingual system for measuring semantic textual similarity. The Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 2014; p. 716-21.
  • Cremonesi P, Koren Y, Turrin R. Performance of recommender algorithms on top-n recommendation tasks.The Proceedings of the fourth ACM conference on Recommender systems. 2010; p. 39-46. Crossref
  • Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. SIAM journal on computing. 2002; 31(6):1794-813. Crossref
  • Grefenstette G. Evaluation techniques for automatic semantic extraction: comparing syntactic and window based approaches. Columbus, Ohio: The Proc. of the SIGLEX Workshop on Acquisition of Lexical Knowledge from Text. 1993.
  • Koren Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. The Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008; p. 426-34. Crossref
  • Quinlan JR. Edinburgh University Press: Discovering rules by induction from large collections of examples: Expert systems in the micro electronic age. 1979.
  • Quilan J. Learning efficient classification procedures and their application to chess end games. Machine Learning: An Artificial Intelligence Approach, 1. 1983.
  • Wirth J, Catlett J. Experiments on the Costs and Benefits of Windowing in ID3. Paper presented at the ML. 1988; p.87-99. Crossref
  • Wu TJ, Huang YH, Li LA. Bioinformatics : Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.. 2005; 21(22):4125-32.
  • Mihalcea R, Corley C, Strapparava C. Boston, Massachusetts: AAAI Press: Corpus-based and knowledgebased measures of text semantic similarity. Association for the Advancement of Artificial Intelligence. 2006 July 16-20; p. 775-80.
  • Morris AH, Kasper GM, Adams DA. The effects and limitations of automated text condensing on reading comprehension performance. Information Systems Research.1992; 3(1):17-35. Crossref
  • Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. The Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998; p. 335-36. Crossref

Refbacks

  • »


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.