Total views : 254

Characteristic Selection with Rough Sets for Web Page Ranking

Affiliations

  • Department of CSE, GMRIT, Rajam - 532127, Andhra Pradesh, India

Abstract


Objective: The objective is to classify web pages and assign ranking to web pages using feature selection with rough sets and TF_IDF methodology. Proposed Method: Web page ranking is a process to assign position at a particular site appears in the result of web page. A site is said to have a high page ranking when it appear at or near the top of the list of web result. A challenge in web page ranking is to provide relevant information to the user according to query. To finding relevant information from the result set is a tedious process. To obtain a refined result set that contains the URL’s more relevant to the user’s query, so it is essential to rank. For classification purpose, we are using feature reduction method based Rough Set Theory (RST). Application: Feature selection is most essential technique in rough sets as well as the data mining. Attribute selection is a main challenge for expanding the theory and making use of rough set. Findings: The proposed method emphases on the removal of the unnecessary attributes as a way to sort the effective reduct set and framing the core of the attribute set. After successful classification procedure, we have to applying TF_IDF methodology for assign the ranking to the documents.

Keywords

Core, Data Preprocessing, Data Mining, Feature Selection, Rough Sets Theory (RST), Reduct, Tf-IDF, Text Mining

Full Text:

 |  (PDF views: 238)

References


  • Lewis DD, Ringuette M. Comparison of two learning algorithms for text categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR’94); Las Vegas, NV. 1994 Apr 11-13. p. 81-93.
  • Debole S, Debole F, Sebastiani F. Supervised term weighting for automated text categorization. Proceedings of SAC-03, 18th ACM Symposium on Applied Computing; Melbourne, US. 2003. p. 784-8.
  • Yang L, Yang Y, Liu X. A re-examination of text categorization methods. SIGIR-99; 1999. p. 42-9.
  • Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science. 1990; 41(6):391407.
  • Wiener E, Pedersen JO, Weigend AS. A neural network approach to topic spotting. Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR’95); Las Vegas, US. 1995. p. 317-32.
  • Pawlak Z. Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer; 1991.
  • Mladenic D. Feature subset selection in text learning. Proceedings of the10th European Conference on Machine Learning (ECML98); 1998.
  • Chouchoulas A, Shen Q. Rough set-aided keyword reduction for text categorization. Applied Artificial Intelligence. 2001; 15(9):843-73.
  • Tf-idf: A Single-Page Tutorial - Information Retrieval and Text Mining. Available from: http://www.tfidf.com.
  • Mohod SW, Dhote CA. Feature Selection technique for text document classification: An alternative approach. IJRITCC. 2014; 2(9):2914-7.
  • Salton G, Buckley C. Term-weighing approach sin automatic text retrieval. In Information Processing and Management. 1988; 24(5):513-23.
  • Nir O. Reexamining tf.idf based information retrieval with genetic programming. Proceedings of SAICSIT South African Institute for Computer Scientists and Information Technologists; Republic of South Africa. 2002. p. 1-10.
  • Aizawa A. An information-theoretic perspective of tf–idf measures. 2003; 39(1):45-65.
  • Wang Y, Wang XJ. A new approach to feature selection in text classification. Proceedings of 4th International Conference on Machine Learning and Cybernetics; Guangzhou, China. 2005 Aug 18-21. p. 3814-9.
  • Lee LW, Chen SM. New methods for text categorization based on a new feature selection method and new similarity measure between documents. IEA/AEI; France. 2006. p. 1280-9.
  • Gunasundari R, Karthikeyan S. A study of content extraction from web pages based on links. 2012 May; 2(3):23-30.
  • Kumar V, Singhal N, Dixit A, Sharma AK. A novel architecture of perception oriented web search engine based on decision theory. Indian Journal of Science and Technology. 2015 Apr; 8(7):635-41.
  • Porter MF. An algorithm for stripping. Program. 1980; 14(3):130-7.
  • Scott S, Matwin S. Text classification using word net hypernyms. Proceedings of the Conference on the Use of WordNet in Natural Language Processing Systems; 1998. p. 45-51.
  • Jing LP, Huang HK, Shi HB. Improved feature selection approach TFIDF in text mining. 2002; 2:944-6.
  • An1 A, Huang Y, Huang X, Cercone N. Feature Selection with Rough Set for Web Page. 2005. p.1-15.
  • Buyukkokten O, Garcia-Molina H, Paepcke A. Seeing the whole in parts: Text summarization for Web browsing on hand held devices. Proceedings of WWW10; Hong Kong, China. 2001 May. p. 652-62.
  • Chakrabarti S, Dom B, Indyk P. Enhanced hypertext categorization using hyperlinks. Proceedings of the ACMSIGMOD; 1998. p. 307-18.
  • Chen H, Dumais ST. Bringing order to the Web: Automatically categorizing search results. Proceedings of CHI2000, School of Information Management and Systems, University of California; Berkeley, CA. 2000. p. 145-52.
  • Huang Y. Web-based classification using machine learning approaches [Master’s thesis]. Regina: Department of Computer Science, University of Regina; 2002.
  • Agarwal B, Mittal N. Sentiment classification using rough set based hybrid feature selection. 2013. p.115-9.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.