Total views : 239

Semi-Supervised Distributional Vector Generation Techniques for Text Classification

Affiliations

  • Computer Science and Engineering Department, Jyothi Engineering College, Cheruthuruthy - 679531, Kerala, India

Abstract


Text class has loved its privilege as a core studies area in text mining. Supervised, unsupervised are the 2 famous paradigms within the technique of type. Relatively novel method of classification is semi-supervised mastering which is midway among the supervised and unsupervised getting to know. With smaller schooling statistics units and taking the large without problems to be had unlabeled data, the procedure of studying in class is refined. There are versions in semisupervised, transductive gaining knowledge of wherein the trained and untrained facts are given in advance the classifier is built, the goal is to expect the magnificence label of untrained data. The opposite version is inductive learning in which the labeled and unlabeled statistics is utilized in model constructing; goal of the version is to predict the unseen information magnificence label. The paper aims to using transductive getting to know to classifying the textual statistics with the aid of considering the phrases appearing in special parts of the record. The words performing inside the introductory and conclusion a part of the files may additionally play important function within the procedure of type, than the ones seemed in other parts. The approach employed could provide one of a kind weights to words primarily based on their presence in one-of-a-kind role of the document. Taking into consideration the above within the procedure of mapping the textual facts into numerical patterns editions of distributed vector generations are acquired. Taking into account large differences in the duration of the documents, distinct normalization techniques are employed which gave eights one-of-a-kind vectors. Non-parametric, most effective to put into effect ok-nearest neighbour algorithm is hired for free-go with the flow textual classification. The outcomes received conclude that semi-supervised textual class can be carried out without loss in category accuracy where restrained skilled records is to be had, as the accuracies of the gaining knowledge of model in supervised and emi-supervised coincide with each other.

Keywords

Distributional Vectors, KNN, Semi-Supervised, Text Classification, Transductive Learning.

Full Text:

 |  (PDF views: 199)

References


  • Sebastiani FS, et al. An improved boosting algorithm and its application to automated text categorization. Proceedings of CIKM-00, 9th ACM International Conference on Information and Knowledge Management’ McLean, VA. 2000. p. 78–85.
  • Available from: http://www.daviddlewis.com/resources/testcollections/reuters21578/
  • Raschka. Naive bayes and text classification I-introduction and theory. 2014 Oct.
  • Wang C, et al. Text classification with heterogeneous information network kernels. 13th AAAI Conference on Artificial Intelligence; 2016.
  • Collobert R, et al. Natural language processing (almost) from scratch. The Journal of Machine Learning Research. 2011; 12(2011):2493–537.
  • Nayef N. Text zone classification using unsupervised feature learning. 13th International Conference on Document Analysis and Recognition; 2015. p. 776–80.
  • Wajeed MA, et al. Text classification using machine learning. Journal of Theoretical and Applied Information Technology. 2009; 7(2):119–23.
  • Available from: tartarus.org/~martin/PorterStemmer
  • Cunnigham P, et al. K-nearest neighbour classifiers. Technical Report; UCD-CSI 2007-4.
  • Aha DW, et al. Instance-based learning algorithms. Machine Learning. 1991; 6:37–66.
  • Yeung C-MA, et al. A k-nearest-neighbour method for classifying web search results with data in folksonomies. 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology; 2008. p. 70–6.
  • Harish BS, et al. Representation and classification of text documents: A brief review. IJCA. 2010; Special Issue on RTIPPR(2):110–9.
  • Wajeed MA, et al. Text classification using KNN classifier. International Journal of Computer Science and System Analysis. 2009 Jul-Dec; 3(2):83–7.
  • Wajeed MA, et al. Supervised and semi-supervised learning in text classification using enhanced KNN algorithm (A comparative study of supervised and semi-supervised classification in text categorization) International Journal of Intelligent Systems Technologies and Applications Journal. 2012; 11(3/4):179–95.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.