Total views : 373

Author profile prediction using pivoted unique term normalization

Affiliations

  • Department of IT, Vardhaman College of Engineering, Shamshabad, Hyderabad - 500018, Telangana, India
  • Department of CSE, JNTUH College of Engineering, Karimnagar - 505501, Telangana, India
  • Department of CSE, Matrusri Engineering College, Hyderabad - 500059, Telangana, India

Abstract


Author profiling is a text classification technique, which is used to predict the demographic characteristics of the authors by analyzing their written texts. Author Profiling became popular in several information technology enabled applications such as marketing, forensic analysis, psychology and entertainment. In reviews domain, most of the authors write reviews on several products without specifying their details. In this context, Author Profiling is helpful to know about the characteristics of the authors like gender, age, native language, educational background, location and personality traits by analyzing their written texts.  Most of the approaches for Author Profiling used various features like lexical features, content based features, structural features, syntactic features and semantic features to differentiate the writing style of the authors. These approaches of Author Profiling suffer from high dimensionality of features and fail to capture the relationship between the features. In this paper, a new approach is proposed to address the high dimensionality feature space problem by aggregating the term weights to find the weight of a document against the profile of the authors. The proposed approach was experimented on reviews domain to predict the gender and age group of the authors using accuracy as a measure.

Keywords

Accuracy, Age Prediction, Author Profiling, Document Weight, Gender Prediction, Pivoted Unique Term Normalization.

Full Text:

 |  (PDF views: 247)

References


  • Koppel M, Argamon S, Shimoni A. Automatically categorizing written texts by author gender. Literary and Linguistic Computing; 2003. p. 401–12.
  • Schler J, Koppel M, Argamon S, Pennebaker J. Effects of age and gender on blogging. Proceedings of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs; 2006 Mar.
  • Pennebaker J. The secret life of pronouns: What our words say about us. Bloomsbury, USA; 2013.
  • Newman ML, Groom CJ, Handelman LD, Pennebaker J.Gender differences in language use. An Analysis of 14,000 Text Samples Discourse Processes; 2008. p. 211–36.
  • Pennebaker JW, Francis ME, Booth RJ. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates; 2001.
  • Argamon S, Koppel M, Pennebaker JW, Schler J. Mining the blogosphere: Age, gender and the varieties of self-expression.First Monday. 2007; 12(9).
  • Satish CJ, Anand M. Software documentation management issues and practices: A Survey. Indian Journal of Science and Technology. 2016 May; 9(20):1–7.
  • Santosh K, Romil B, Shekhar M, Vasudeva V. Author profiling: Predicting age and gender from blogs. Proceedings of CLEF 2013 Evaluation Labs; 2013.
  • Rao AS, Ramakrishna S, Babu PC. MODC: MultiObjective Distance based Optimal Document Clustering by GA. Indian Journal of Science and Technology. 2016 Jul; 9(28):1–8.
  • López-Monroy AP, Montes-y-Gómez M, Hugo JE, Luis VP. Using intra-profile information for author profiling.Proceedings of CLEF 2014 Evaluation Lab; 2014.
  • López-Monroy AP, Montes-y-Gómez M, Hugo JE, Luis VP, Esaú VT. INAOE’s participation at PAN’13: Author profiling task. Proceedings of CLEF 2013 Evaluation Labs; 2013.
  • Wee-Yong L, Jonathan G, Vrizlynn LLT. Content-centric age and gender profiling. Proceedings of CLEF 2013 Evaluation Labs; 2013.
  • Argamon S, Koppel M, Pennebaker JW, Schler J.Automatically profiling the author of an anonymous text.Communications of the ACM. 2009; 52(2):119.
  • Urvashi G, Vishal G. Maulik: A plagiarism detection tool for Hindi documents. Indian Journal of Science and Technology. 2016 Mar; 9(12):1–11.
  • Kalpana S, Vigneshwari S. Selecting multiview point similarity from different methods of similarity measure to perform document comparison. Indian Journal of Science and Technology. 2016 Mar; 9(10):1–6.
  • Edson RDW, Viviane PM, José PMO. Exploring information retrieval features for author profiling. Proceedings of CLEF 2014 Evaluation Labs; 2014.
  • Edson RDW, Viviane PM, José PMO. Using simple content features for the author profiling task. Proceedings of CLEF 2013 Evaluation Labs; 2013.
  • Edson RDW. Information retrieval features for personality traits. Proceedings of CLEF 2015 Evaluation Labs; 2015.
  • Estival D, Gaustad T, Pham SB, Radford W, Hutchinson B. Author profiling for english email. 10th Conference of the Pacific Association for Computational Linguistics (PACLING); 2007.
  • Dharinya S. Analysis of document summarization and word classification in a smart environment. Indian Journal of Science and Technology. 2016 May; 9(19):1–7.
  • Juan SC, Leo W. How to use less features and reach better performance in author gender identification? The 9th edition of the Language Resources and Evaluation Conference (LREC); 2007 May. p. 26–31.
  • Dang DP, Giang BT, Bao PS. Author profiling for Vietnamese blogs. Asian Language Processing (IALP); 2009. p. 190–4.
  • Dang DP, Giang BT, Bao PS. Authorship attribution and gender identification in Greek blogs. 8th International Conference on Quantitative Linguistics (QUALICO); 2012 Apr. p. 26–9.
  • Amit S, Chris B, Mandar M. Pivoted document length normalization.In SIGIR ’96: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, New York, USA; 1996. p. 21–9.
  • Stopwords list [Internet]. [cited 2000]. Available from: http://members.unine.ch/jacques.savoy/clef/index.html.
  • Porter MF. Developing the English Stemmer; 2002.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.