Total views : 481

Predicting the Sentimental Reviews in Tamil Movie using Machine Learning Algorithms

Affiliations

  • Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore Amrita Vishwa Vidyapetham, Amrita University, Coimbatore – 641112, Tamil Nadu, India

Abstract


Objective: This paper aims at classifying the Tamil movie reviews as positive and negative using supervised machine learning algorithms. Methods/Analysis: A novel machine learning approaches are needed for analyzing the Social media text where the data are increasing exponentially. Here, in this work, Machine learning algorithms such as SVM, Maxent classifier, Decision tree and Naive Bayes are used for classifying Tamil movie reviews into positive and negative. Features are also extracted from TamilSentiwordnet. Findings: The dataset for this work has been prepared. SVM algorithm performs well in classifying the Tamil movie reviews when compared with other machine learning algorithms. Both cross validation and accuracy of the algorithm shows that SVM performs well. Other than SVM, Decision tree perform well in classifying the Tamil reviews. Novelty/Improvement: SVM gives an accuracy of 75.9% for classifying Tamil movie reviews which is a good milestone in the research field of Tamil language.

Keywords

Machine Learning, Maxent Classifier, Sentimental Analysis, Support Vector Machine, Tamil Language, TamilSentiwordnet.

Full Text:

 |  (PDF views: 502)

References


  • Hutto CJ, Gilbert E. Vader: A parsimonious rule-based model for sentiment analysis of social media text. Eighth International AAAI Conference on Weblogs and Social Media; 2014 May 16.
  • Fink CR, Chou DS, Kopecky JJ, Llorens AJ. Coarse- and fine-grained sentiment analysis of social media text. Johns Hopkins APL Technical Digest. 2011 Jan; 30(1):22–30.
  • Selvan A, Anand Kumar M, Soman, KP. Sentiment analysis of Tamil movie reviews via feature frequency count. IEEE International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS 15); 2015.
  • Pandey P, Govilkar S. A framework for sentiment analysis in Hindi using HSWN. International Journal of Computer Applications. 2015 Jan; 119(19):23–6.
  • Kumar A, Kohail S, Ekbal A, Biemann C. IIT-TUDA: System for sentiment analysis in Indian languages using lexical acquisition. Mining Intelligence and Knowledge Exploration. 2015 Dec; 9468:684–93.
  • Balamurali AR. Cross-lingual sentiment analysis for Indian languages using linked wordnets. CiteSeer. 2012.
  • Mittal N, Agarwal B, Chouhan G, Bania N, Pareek P. Sentiment analysis of Hindi review based on negation and discourse relation. Proceedings of International Joint Conference on Natural Language Processing; 2013. p. 45–50.
  • Timmaraju A, Khanna V. Sentiment analysis on movie reviews using recursive and recurrent neural network architectures. Semantic Scholar; 2015.
  • Anand Kumar M, Dhanalakshmi V, Soman KP, Rajendran S. Factored statistical machine translation system for English to Tamil language. Pertanika Journal of Social Science and Humanities. 2014; 22(4):1045–61.
  • Anand Kumar M, Dhanalakshmi V. A novel approach to morphological analysis for Tamil language. Germany: University of Koeln Koln; 2009 Oct.
  • Anand Kumar M, Soman KP. AMRITA-CEN@FIRE-2014: Morpheme extraction and lemmatization for Tamil using machine learning. ACM International Conference Proceeding Series; 2014 Dec. p. 112–20.
  • Anand Kumar M, Rajendran, S, Soman KP. Tamil word sense disambiguation using Support Vector Machines with rich features. International Journal of Applied Engineering Research. 2014; 9(20):7609–20.
  • Abinaya, N, Anand Kumar M, Soman KP. Randomized kernel approach for named entity recognition in Tamil. Indian Journal of Science and Technology. 2015; 8(24).
  • Abinaya N, John N, Barathi Ganesh HB, Anand Kumar M, Soman KP. AMRITA-CEN@FIRE-2014: Named entity recognition for Indian languages using rich features. Proceedings of ACM International Conference Series; 2014 Dec. p. 103–11.
  • Patra BG, Das D, Das A, Prasath R. Shared task on sentiment analysis in Indian languages (sail) tweets - An overview. Mining Intelligence and Knowledge Exploration; 2015 Dec. p. 650–5.
  • Shriya S, Vinayakumar R, Kumar MA, Soman KP. AMRITACEN@ SAIL2015: Sentiment analysis in Indian Languages. Mining Intelligence and Knowledge Exploration; 2015 Dec. p. 703–10.
  • Kumar SS, Premjith B, Kumar MA, Soman KP. AMRITA_ CEN-NLP@ SAIL2015: Sentiment analysis in Indian Language using regularized least square approach with randomized feature learning. Mining Intelligence and Knowledge Exploration; 2015 Dec. p. 671–83.
  • Rueping S. SVM classifier estimation from group probabilities. Proceedings of the 27th International Conference on Machine Learning (ICML-10); 2010. p. 911–8.
  • Rosset S, Tibshirani R, Zhu J, Hastie TJ. The entire regularization path for the Support Vector Machine. Advances in Neural Information Processing Systems; 2004. p. 561–8.
  • El-Halees A. Arabic text classification using maximum entropy. The Islamic University Journal (Series of Natural Studies and Engineering). 2007; 15(1):157–67.
  • Nigam K, Lafferty J, McCallum A. Using maximum entropy for text classification. IJCAI-99 Workshop on Machine Learning for Information Filtering. 1999 Aug; 1:61–7.
  • Quinlan JR. Induction of Decision trees. Machine Learning. 1986 Mar; 1(1):81–106.
  • Yuxun L, Niuniu X. Improved ID3 algorithm. 2010 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT). 2010 Jul; 8:465–8.
  • Ting SL, Ip WH, Tsang AHC. Is Naive Bayes a good classifier for document classification? International Journal of Software Engineering and Its Applications. 2011 Jul; 5(3):37–46.
  • Panda M, Abraham A, Patra MR. Discriminative multinomial Naive Bayes for network intrusion detection. 2010 Sixth International Conference on Information Assurance and Security (IAS); 2010 Aug. p. 5–10.
  • Juan A, Ney H. Reversing and smoothing the multinomial Naive Bayes text classifier. PRIS; 2002 Apr. p. 200–12.
  • Lewis DD. Naive Bayes at forty: The independence assumption in information retrieval. Machine Learning: ECML-98; 1998 Apr. p. 4–15.
  • Amor NB, Benferhat S, Elouedi Z. Naive Bayes vs. Decision trees in intrusion detection systems. Proceedings of the 2004 ACM symposium on Applied Computing; 2004 Mar. p. 420–4.
  • Das A, Gamback B. Sentimantics: Conceptual spaces for lexical sentiment polarity representation with contextuality. Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis. Association for Computational Linguistics; 2012 Jul. p. 38–46.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.