Total views : 976

Urdu Documents Classification using Naïve Bayes

Affiliations

  • Department of Computer Science and Software Engineering, NED University of Engineering and Technology, Karachi 75270, Pakistan

Abstract


Objectives: The purpose of this conceptual paper is to highlight the process involved in handling of editorials based on Urdu morphology for better classification purpose. Methods: The first step is to collect editorials belongs to different categories, Corpus will be formed by the collected data, preprocessing activities makes corpus more reliable and relevant. Naïve Bayes will be used for classification purpose. Naïve Bayes is identified as best technique for serving as a document classification model, it produces fastest and accurate results as well as very robust to irrelevant features. Findings: Handling Urdu morphology is one of the biggest tasks of our research, to handle this problem we need to encode corpus by using utf-8 encoding and thereby changing system locale Urdu easily appear in readable form. Application: The main purpose of this approach is to work on classification of Urdu documents, as Urdu is a South Asian Language, which is among the widely spoken in the sub-continent. Urdu document classification involves all the pre-processing activities such as Language processing tasks, labeling and tagging; the tool explored will be R. It will be very helpful for all the firms who manage and manipulate data in Urdu languages e.g. this approach can be implemented on all Urdu news editorials so that all the editorials will be classified to different sub-categories so that user can extract the information he is looking for. This research is still in progress and may very time to time.

Keywords

Classification, Classification using Naïve Bayes, Documents Classification, Naïve Bayes Classification, Urdu Documents Classification.

Full Text:

 |  (PDF views: 512)

References


  • Ting SL, Ip WH, Albert HC, Tsang. Is Naïve Bayes a good classifier for document classification? International Journal of Software Engineering and Its Applications. 2011 Jul; 5(3):1–10.
  • Ali AR, Ijaz M. Urdu text classification. ACM, New York, NY, USA; 2009. Crossref
  • Gogoi M, Sarma SK. Document classification of Assamese text using Naïve Bayes approach. International Journal of Computer Trends and Technology. 2015; 30(4):1–5.
  • Vidhya KA, Aghila G. A survey of Naive Bayes machine learning approach in text document classification. International Journal of Computer Science and Information Security. 2010; 7(2):1–6.
  • Ramdass D, Seshasai S. Document classification for newspaper articles, Spring; 2009 May 18. p. 1–12.
  • Korde V, Mahender CN. Text classification and classifiers a survey. International Journal of Artificial Intelligence & Applications. 2012 Mar; 3(2):1–15. Crossref
  • Hussien MI, Olayah F, Al-Dwan M, Shamsan A. Arabic text classification using SMO, Naïve Bayesian, J48 Algorithms International Journal of Recent Research and Applied Studies. 2011 Nov; 9(2):1–11.
  • Sathyadevan S, Athira U, Sarath PR, Anjana V. Improved document classification through enhanced Naive Bayes algorithm. International Conference on Data Science & Engineering; 2014. p. 100–4. Crossref
  • Rakholia RM, Saini JR. Classification of Gujarati documents using Naïve Bayes classifier. Indian Journal of Science and Technology. 2017 Feb; 10(5):1–9. Crossref
  • Zhang Z. Naïve Bayes classification in R. Annals of Translational Medicine. 2016 Jun; 4(12):241. Crossref. PMid:27429967 PMCid:PMC4930525
  • Rajeswari RP, Juliet K, Aradhana. Text classification for student data set using Naive Bayes classifier and KNN classifier. International Journal of Computer Trends and Technology. 2017 Jan; 43(1):1–5.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.