Total views : 314

Machine Learning Classifiers: Evaluation of the Performance in Online Reviews

Affiliations

  • Faculty of Science and Technology, Sunway University, 5, JalanUniversiti, Bandar Sunway, Subang Jaya, Selangor - 47500, Malaysia

Abstract


Objectives: This paper aims to evaluate the performance of the machine learning classifiers and identify the most suitable classifier for classifying sentiment value. The term “sentiment value” in this study is referring to the polarity (positive, negative or neutral) of the text. Methods/Analysis: This work applies machine learning classifiers from WEKA (Waikato Environment for Knowledge Analysis) toolkit in order to perform their evaluation. WEKA toolkit is a great set of tools for data mining and classification. The performance of the machine learning classifiers was evaluated by looking at overall accuracy, recall, precision, kappa statistic and few visualization techniques. Finally, the analysis is applied to find the most suitable classifier for classifying sentiment value. Findings: Results show that two classifiers from Rules and Trees categories of classifiers perform equally best comparing to the other classifiers from categories, such as Bayes, Functions, Lazy and Meta. Novelty/Improvement: This paper explores the performance of machine learning classifiers in sentiment value classification of the online reviews. Data used is never been used before to explore the performance of machine learning classifiers.

Keywords

Comments, Machine Learning Classifiers, Online Reviews, Polarity, Sentiment Analysis.

Full Text:

 |  (PDF views: 333)

References


  • Pereira F, Mitchell T, Botvinick M. Machine learning classifiers and fMRI: A tutorial overview. NeuroImage. 2009; 45(1):S199–209.
  • Oregon DT. Machine learning and ecosystem informatics: Challenges and opportunities. 1st Asian Conference on Machine Learning (ACML); Nanjing: Springer; 2009. p. 1–5.
  • Nguyen TTT, Armitage G. Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks. Proceedings - Conference on Local Computer Networks, LCN; 2006. p. 369–76.
  • Trafalis TB, Adrianto I, Richman MB, Lakshmivarahan S. Machine-learning classifiers for imbalanced tornado data. Computational Management Science. 2014; 11(4):403–18.
  • James SL, Henderson EE, Shatzel JJ, Dickson R. Mo1903 machine learning classifiers: A novel approach to predicting bleeding risk in hospitalized cirrhotic patients. Gastroenterology. 2015; 148(4):1079.
  • Vinodhini G, Chandrasekaran R. Performance evaluation of machine learning classifiers in sentiment mining. International Journal of Computer Trends and Technology. 2013; 4(6):1783–6.
  • Theodosiou T, Valsamidis S, Hatziliadis G, Nikolaidis M. Measuring, archetyping and mining Olea europaea production data. Journal of Systems and Information Technology. 2012; 14(4):318–35.
  • Bothin A, Clough P. Predicting Meeting Participants’ note-taking from previously uttered dialogue acts. Journal of Systems and Information Technology. 2016; 18(2):170–85.
  • Yeom H. Study of machine-learning classifier and feature set selection for intent classification of Korean tweets about food safety. Journal of Information Science Theory and Practice. 2014; 2(3):29–39.
  • Kotsiantis S. Supervised machine learning: A review of classification techniques. Informatica (Ljubljana). 2007; 31(3):249–68.
  • Kiranmayee BV, Rajinikanth TV, Nagini S. Enhancement of SVM based MRI brain image classification using pre-processing techniques. Indian Journal of Science and Technology. 2016; 9(29):1–7.
  • Yaswanth V, Kiran Kumar K, Harshith N, Sai Teja G, Aparna R. Outfit of Exemplar-SVMs for object detection and beyond. Indian Journal of Science and Technology. 2016; 9(30):1-7.
  • Teh PL, Rayson P, Pak I, Piao S. Exploring fine-grained sentiment values in online product reviews. IEEE Confernece on Open Systems (ICOS); Melaka. 2015. p. 114–8.
  • Bouckaert RR, Frank E, Hall M a., Holmes G, Pfahringer B, Reutemann P, Witten IH. WEKA - Experiences with a Java open-source project. The Journal of Machine Learning Research. 2010; 11:2533–41.
  • Kshirsagar AA, Deshkar PA. Review analyzer analysis of product reviews on WEKA Classifiers. IEEE 2nd International Conference on Innovations in Information, Embedded and Communication systems (ICIIECS); 2015. p. 1–5.
  • Lee M-C. A Tutorial on Bayesian classifier A Tutorial on Bayesian classifier with WEKA with WEKA. Department of Information Management, Yu Da College of Business. 2006.
  • Vijaykumar B, Vikramkumar, Trilochan. Bayes and Naive-Bayes Classifier. 2014.
  • Burduk R. Imprecise information in Bayes classifier. Pattern Analysis and Applications. 2012; 15:147–53.
  • Youn E, Jeong MK. Class dependent feature scaling method using naive Bayes classifier for text datamining. Pattern Recognition Letters. 2009; 30(5):477–85.
  • Kim Ha NJ, Chang J. Integrating Incremental feature weighting into naive bayes text classifier. IEEE International Conference on Machine Learning and Cybernetics; Hong Kong. 2007. p. 19–22.
  • Jamain A, Hand DJ. The Naive Bayes mystery: A classification detective story. Pattern Recognition Letters. 2005; 26(11):1752–60.
  • Hsu C-C, Huang Y-P, Chang K-W. Extended Naive Bayes classifier for mixed data. Expert Systems with Applications. 2008; 35(3):1080–3.
  • Lu SH, Chiang Da, Keh HC, Huang HH. Chinese text classification by the Naive Bayes classifier and the associative classifier with multiple confidence threshold values. Knowledge-Based Systems. 2010; 23(6):598–604.
  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software. ACM SIGKDD Explorations. 2009; 11(1):10–8.
  • Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D. WEKA Manual for Version 3-6-2. Waikato; 2010. p. 303.
  • Taddy M. Measuring political sentiment on twitter: factor optimal design for multinomial inverse regression. Technometrics. 2013; 55(4):415–25.
  • Patel SM, Gunn JP, Tong X, Cogswell ME. Consumer Sentiment on Actions Reducing Sodium in Processed and Restaurant Foods, Consumer Styles. Am J Prev Med. 2014; 46(5):516–24.
  • Prochaska JJ, Shi Y, Rogers A. Tobacco use among the job-seeking unemployed in California. Preventive Medicine. 2013; 56(5):329–32.
  • Khan MF, Chauhan G, Jaitly AK. An approach to overcome imbalance datasets of eukaryotic genomes during the analysis by machine learning technique (SVM). Indian Journal of Science and Technology. 2011; 4(5):119–29.
  • Chinna Gopi S, Suvarna B, Maruthi Padmaja T. High dimensional unbalanced data classification vs. svm feature selection. Indian Journal of Science and Technology. 2016; 9(30):1-7.
  • Jiang S, Pang G, Wu M, Kuang L. An improved K-nearest-neighbor algorithm for text categorization. Expert Systems with Applications. 2012; 39(1):1503–9.
  • Wang G, Sun J, Ma J, Xu K, Gu J. Sentiment classification: The contribution of ensemble learning. Decision Support Systems. 2014; 57(1):77–93.
  • Liu SM, Chen J-H. A multi-label classification based approach for sentiment classification. Expert Systems with Applications. 2015; 42(3):1083–93.
  • Kalaiselvi P, Nalini C. A comparative study of meta classifier algorithms on multiple datasets. International Journal of Advanced Research in Computer Science and Software Engineering. 2013; 3(3):654–9.
  • Xia R, Zong C, Li S. Ensemble of feature sets and classification algorithms for sentiment classification. Information Sciences. 2011; 181(6):1138–52.
  • Franco-Salvador M, Cruz FL, Troyano Ja, Rosso P. Cross-domain polarity classification using a knowledge-enhanced meta-classifier. Knowledge-Based Systems. 2015; 86(May):1–11.
  • Previtali F, Arrieta a. F, Ermanni P. Double-walled corrugated structure for bending-stiff anisotropic morphing skins. Journal of Intelligent Material Systems and Structures. 2015; 26(5):599–613.
  • Pang B, Lee L. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. ACL ’05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics; Ann Arbor: Association for Computational Linguistics Stroudsburg, PA, USA. 2005. p. 115–24.
  • Mencar C, Castiello C, Cannone R, Fanelli a. M. Design of fuzzy rule-based classifiers with semantic cointension. Information Sciences. 2011; 181(20):4361–77.
  • Romanyshyn M. Rule-based sentiment analysis of Ukrainian reviews. IJAIA. 2011; 2(4):.91–103.
  • Tromp E, Pechenizkiy M. RBEM: a rule based approach to polarity detection. WISDOM’13. Chicago: ACM; 2013. p. 1–29.
  • Kent EL. Text analytics - Techniques, language and opportunity. Business Information Review. 2014; 31(1):50–3.
  • Zhang W, Li P, Zhu Q. Sentiment classification based on syntax tree pruning and tree kernel. 7th Web Information Systems and Applications Conference; 2010. p. 101–5.
  • Jotheeswaran J, Koteeswaran S. Decision tree based feature selection and multilayer perceptron for sentiment analysis. ARPN Journal of Engineering and Applied Sciences. 2015; 10(14):5883–94.
  • Hamouda AEA, El-taher FE. Sentiment analyzer for arabic comments system. International Journal of Advanced Computer Science and Applications. 2013; 4(3):99–103.
  • Rayson P. From key words to key semantic domains. International Journal of Corpus Linguistics. 2008; 13(4):519–49.
  • Mohammad MN, Sulaiman N, Muhsin OA. A novel intrusion detection system by using intelligent data mining in WEKA environment. Procedia Computer Science. 2011; 3:1237–42.
  • Bengio Y, Grandvalet Y. No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research. 2004; 5:1089–105.
  • Kosolapov S. Electronic Instrumentation Errors in Measurements. 2013. p. 20 .
  • Menczer F. ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery. Proceedings of 14th International Conference on Machine Learning. 1997. p. 227–35.
  • Yang Z, Zhou M. Weighted kappa statistic for clustered matched-pair ordinal data. Computational Statistics and Data Analysis. 2015; 82:1–18.
  • Viera AJ, Garrett JM. Understanding interobserver agreement: The kappa statistic. Family Medicine. 2005; 37(5):360–3.
  • Martinez-Camblor P, Carleos C, Corral N. General nonparametric ROC curve comparison. Journal of the Korean Statistical Society. 2013; 42(1):71–81.
  • Rao G. What is an ROC curve? The Journal of family practice. 2003; 52(9):695.
  • Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. 2nd ed. San Francisco: Elsevier Inc.; 2005.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.