Total views : 210
News Classification: A Data Mining Approach
Objectives: Text classification is one of the important applications of data mining. Text classification classifies text documents on the basis of words, phrases, combination of words etc. into predefined class labels. Method/Analysis: Present study classifies news data into four predefined classes namely Business, Entertainment, sports and Technology. For text classification WEKA an open source data mining tool is used. Different classification algorithms are applied on News data set. A comparative study of these algorithms is done based on Accuracy, Time, Errors and ROC to predict the best algorithm for news data set classification. Findings: Present study analyzed result on the basis of accuracy, time, error and ROC curve. Present work concludes that NaïveBayes Multinomial algorithm is best for news classification.
Classification Algorithms, Data Mining, Text Classification, WEKA.
- Han J, Kamber M. Data mining - concepts and techniques.Third Edition, India
- Sebastiani F. Machine learning in automated text categorization.JournalACM Computing Surveys. 2002 Mar;34(1):1–47.
- Menaka S, Radha N. Text classification using keyword extraction technique. International Journal of Advanced Research in Computer Science and Software Engineering.2013 Dec; 3(12).
- Hulth A,Megyesi BB.A study on automatically extracted keywords in text categorization. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Sydney; 2006 Jul. p. 537–44.
- Melville P, Gryc W, Lawrence RD. Sentiment analysis of blogs by combining lexical knowledge with text classification.KDD ‘09 Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data mining; 2009. p.1275–84.
- Kawade DR, Oza KS. SMS spam classification using WEKA.International Journal of Electronics Communication and Computer Technology. 2015 May;2015. p. 5.
- Delany, SJ, Cunningham P. An analysis of case-base editing in a spam filtering system. Advances in Case-based Reasoning. Springer Berlin Heidelberg; 2004. p. 128–41.
- Li-gong Y, Jian Z, Shi-ping T. Keywords extraction based on text classification. Proceedings of the 2nd International Conference On Systems Engineering and Modeling (ICSEM13); 2013.
- WEKA [Internet]. [cited 2015 Aug 21]. Available from: http://www.cs.waikato.ac.nz/~ml/weka.
- Junaid MB, Farooq M. Using evolutionary learning classifiers to do mobile spam (SMS) filtering. GECCO’11, Dublin, Ireland; 2011 Jul.
- Kappa [Internet]. [cited 2015 Oct 12]. Available from: http://www.dmi.columbia.edu/homepages/chuangj/kappa.
- Available from: http://gim.unmc.edu/dxtests/roc3.htm.
- Available from:https://en.wikipedia.org/wiki/Receiver_ operating_characteristic.
- Kang MY, Shin J-D, Kim B. Automatic subject classification of korean journals based on KSCD.
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.