Total views : 189

Comparison of Different Attributes of Authorship Data using Data Mining Approach

Affiliations

  • Amity School of Engineering and Technology, Amity University, Amity Campus Sector –125, Noida –201303, Uttar Pradesh, India
  • School of Computer Science and Information Technology, University of Hyderabad, Central University P.O., Prof. C.R.Rao Road, Gachibowli, Hyderabad–500046, India

Abstract


In recent years, with the rapid increase in Internet usage, the data that has been generated is huge and unstructured. These data can be interpreted with various techniques of Data Mining. Many useful patterns can be extracted from these trends. Classifying these data into meaningful analysis is the key concept behind this study. In this paper, the authorship data for books was used. A data was created where various attributes of users were stored along with the book that they like to read. Naive Bayes was applied on the data set to find which factor is majorly affecting the ratings of the books. The various attributes were compared using data mining tool and found that the rating of books highly depends upon the location of the user. This interpretation was also verified by the measure of precision and recall. High precision results into more accuracy of the system.

Keywords

Authorship Data, Information Retrieval, Naïve Bayes, Precision, Recall.

Full Text:

 |  (PDF views: 141)

References


  • Ting SL, Ip WH, Tsang AH. Is Naive Bayes a good classifier for document classification? International Journal of Software Engineering and Its Applications. 2011 Jul 3; 5(3):37–46.
  • Chakrabarti S, Roy S, Soundalgekar MV. Fast and accurate text classification via multiple linear discriminant projections. The Very Large Data Base (VLDB) Journal. 2003 Aug 1; 12(2):170–85.
  • Saarikoski J. On text document classification and retrieval using self-organising maps. Academic Dissertation; 2014. p. 1–174.
  • Nagaprasad S, Krishnaveni N, Sastry JKR, Vinayababu A. On authorship attribution of telugu text. Indian Journal of Science and Technology. 2016 Sep; 9(35):1–7.
  • Gupta GK. Introduction to data mining with case studies. 3rd(edn)., Prentice Hall of India; 2006.
  • Han J, Kamber M. Data mining: concepts and techniques. 3rd(edn)., Morgan-Kaufmann Publishers, San Francisco; 2001.
  • Miquélez T, Bengoetxea E, Larranaga P. Evolutionary computation based on Bayesian classifiers. International Journal of Applied Mathematics and Computer Science. 2004 Sep; 14(3):335–50.
  • Stern M, Beck J, Woolf B. Naive bayes classifiers for user modelling. Centre for knowledge communication, Computer Science Department, University of Massachusetts; 1999. p. 1–10.
  • Ashari A, Paryudi I, Tjoa AM. Performance comparison between naive bayes, decision tree and KNN in searching alternative design in an energy simulation tool. (IJACSA) International Journal of Advanced Computer Science and Applications. 2013 Nov; 4(11):33–9.
  • ManiyaH, Hasan MI, Patel KP. Comparative study of naïve bayes classifier and KNN for tuberculosis. International Conference on Web Services Computing; 2011. p. 22–6.
  • Naïve bayes classifier [Internet]. 2016 [Cited 2016 Nov 1]. Available from: https://en.wikipedia.org/wiki/Naive_ Bayes_classifier.
  • Kononenko I. Inductive and Bayesian learning in medical diagnosis. Applied Artificial Intelligence an International Journal. 1993 Oct 1; 7(4):317–37.
  • Book crossing dataset [Internet]. 2004 [Cited 2004]. Available from: http://www2.informatik.uni-freiburg.de/~cziegler/BX/.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.