Total views : 291

Comparison of Performance in Text Mining using Categorization of Unstructured Data

Affiliations

  • Department of Media Engineering, Tongmyung University, Korea, Democratic People's Republic of
  • Department of Information Security, Tongmyung University, Korea, Democratic People's Republic of
  • Choonhae College of Health, Korea, Republic of

Abstract


Background/Objectives: The text mining would help finding information to the users in the enormous documents. The text mining has been actively developed and utilized in various fields, mainly English-based document, but Study on the Korean text mining has been relatively limited. The importance of the Korean text mining has emerged with increasing big data including Korean text data, the needs for the intensive study and application of Big Data are increasing. Methods/Statistical Analysis: In this study, we compared the performance of these classifications by applying the method of Bayesian methods, k-NN, decision trees, SVM, and as a neural network in classification of unstructured newspaper article into given categories. Findings: In the experiment result, the SVM model has a high F-measure value relative to other models, and has shown stable results in the classification information and recall rate. Also, this model showed a high F-measure value in the classification of a more granular list. Application/Improvements: The methods of k-nn and decision tree show slightly lower performance than SVM, they are turned out to be appropriate models using classification problem cause of having advantages to easy interpretation and short learning time.

Keywords

Categorization, Decision Tree, k-NN, Naive Bayes, Text Mining.

Full Text:

 |  (PDF views: 449)

References


  • Hearst Marti A. Untangling Text Data Mining. Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. 1999; 3(10). DOI: 10.3115/1034678.1034679.
  • Kho Kwangsoo, Chung Wonkyo, Shin Youngkeun, Park Sangsung, Jang Dongsik. A Study on Development of Patent Information Retrieval Using Textmining. Journal of the Korea Academia-Industrial cooperation Society. 2011; 12(8):3677-88.
  • Cho Taeho. Comparison of Neural Network and k-NN Algorithm for News Article Classification. 1998 Conference on Korea Information Science Society. 1998; 25(211):363-65.
  • Bartere MM and Deshmukh PR. Cluster Oriented Image Retrieval Systems. IJCA Proceedings on Emerging Trends in Computer Science and Information Technology. 2012; ETCSIT(3):25-27.
  • Mittermayer M and Knolmayer G. Text Mining Systems for Market Response to News: A survey, Working paper in Institut fur Wirtschaftsinformatik der Universitat Bern. 2006; 184:1-17.
  • Chin KK. The Graduate School of the University of Darwin College: Support Vector Machines applied to speech pattern classification. Dissertation of PhD. 1998.
  • Cart, NC, USA: SAS Institute Inc.: SAS Publishing. SAS® Text Miner™ 4.2 Reference. 2009.
  • Van Driel MA, Bruggeman J, Vriend G, Brunner HG and Leunissen JA. A Text-Mining Analysis of the Human Phenome. European Journal of Human Genetics. 2006; 14(50):535-42.
  • Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys. 2002; 34(1):1-47.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.