Total views : 348

Improved Parallel PageRank Algorithm for Spam Filtering

Affiliations

  • Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal - 462003, Madhya Pradesh, India
  • Adobe Systems, Noida – 201304, India

Abstract


Background/Objectives: PageRanking algorithm is a well known link based technique given by Google for indexing of its web pages. This algorithm works on the linking structure of web pages id est inbound and outbound links of pages. The existing Page Rank algorithm follows equal distribution law that is; it distributes the Page Rank of a web page evenly among all the outgoing links. The problem with the uniform distribution of Page Rank is that sometimes uninteresting pages got high Page Rank values. Methods/Statistical Analysis: This paper proposed an improved parallel Page Rank algorithm that un-uniformly distributes the Page Rank values among all the outgoing links. The proposed work has been implemented on NVIDIA Quadro 2000 GPU architecture using CUDA programming language. Findings: The proposed algorithm mitigates spam and provides better results in terms of computational time as compared to Parallel Page Rank, because it assigns higher priority to important pages and less priority to less important web pages. By assigning values in such a fashion important pages show an increase in the Page Rank value and unrelated pages that is spam pages show a decrease in Page Rank value. Application: The proposed work performs spam filtering by classifying important as well as irrelevant web pages.

Keywords

CUDA, GPU, Non-Uniform Distribution, Parallel Page Rank, Spam Pages.

Full Text:

 |  (PDF views: 260)

References


  • Article title. 2015. Available from: http://www.infotoday.com/searcher/may01/liddy.htm
  • Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer Network and ISDN Systems. 1998; 30(1-7):107–17.
  • Duong NT, Nguyen QAP, Nguyen AT, Nguyen HD. Parallel PageRank computation using GPUs. Proceedings of the Third Symposium on Information and Communication Technology ACM; New York, USA. 2012 Aug. p. 223–30.
  • Dubey H, Roy BN. An improved PageRank algorithm based on optimized normalization technique. IJCSIT. 2011; 2(5):2183–8.
  • Tarun K, Parikshit S, Ankush M. Parallelization of PageRank on multicore processors. Distributed Computing and Internet Technology. Proceedings of 8th International Conference, ICDCIT 2012; Springer Berlin Heidelberg: Bhubaneswar, India. 2012 Feb; 7154:129–40.
  • Article title. 2015. Available from: https://www.google.co.in/insidesearch/howsearchworks/crawling-indexing.html
  • Pu BY, Huang TZ, Wen C. An improved PageRank algorithm: Immune to spam. IEEE Fourth International Conference on Network and System Security; Melbourne. 2010 Sep. p. 425–9.
  • Hua J, Huaxiang Z. Analysis on the content features and their correlation of web pages for spam detection. IEEE Transactions on Communications. China. 2015 Mar; 12(3):84–94.
  • Almomani A, Obeidat A, Alsaedi K, Obaida MAH, Al-Betar M. Spam E-mail filtering using ECOS algorithms. Indian Journal of Science and Technology. 2015 May; 8(S9). DOI:10.17485/ijst/2015/v8iS9/55320.
  • Shri JMR, Subramaniyaswamy V. An effective approach to rank reviews based on relevance by weighting method. Indian Journal of Science and Technology. 2015 Jun; 8(11). DOI: 10.17485/ijst/2015/v8i11/61768.
  • Geetha Rani IS, Sorana Mageswari M. A link-click-concept based ranking algorithm for ranking search results. Indian Journal of Science and Technology. 2014 Jan; 7(10). DOI:10.17485/ijst/2014/v7i10/50682.
  • Anbazhagu UV, Praveen JS, Soundarapandian R, Manoharan N. Efficacious spam filtering and detection in social networks. Indian Journal of Science and Technology. 2014 Nov; 7(S7). DOI: 10.17485/ijst/2014/v7iS7/61956.
  • Arnal J, Migallon H, Migallon V, Palomino JA, Penades J. Parallel relaxed and extrapolated algorithms for computing PageRank. The Journal of Supercomputing. 2014 Nov; 70(2):637–48.
  • Zhu Y, Ye S, Li X. Distributed PageRank computation based on iterative aggregation-disaggregation methods. Proceedings of the 14th ACM International Conference on Information and Knowledge Management; New York, USA. 2005 Oct. p. 578–85.
  • Manaskasemsak B, Rungsawang A. Parallel PageRank computation on gigabit PC
  • Cluster. Proceedings of 18th IEEE International Conference on Advanced Information Networking and Applications AINA. 2004 Mar; 1:273–7.
  • Kohlschiiutter C, Chirita PA, Nejdl W. Efficient parallel computation of PageRank. Advances in Information Retrieval. 28th European Conference on IR Research, ECIR 2006, Springer Berlin Heidelberg,; London, UK. 2006 Apr; 3936:241–52. April 2006. pp. 241–252.
  • Cevahir A, Aykanat C, Turk A, Barla Cambazoglu B. Site-based partitioning and repartitioning techniques for parallel PageRank computation. IEEE Transactions on Parallel and Distributed Systems. 2011 May; 22(5):786–802.
  • Gleich D, Zhukov L, Berkhin P. Fast parallel PageRank: A linear system approach. Technical Report; 2004.
  • Cevahir A, Aykanat C, Turk A, Barla Cambazoglu B, Nukada A, Matsuoka S. Efficient PageRank on GPU clusters. IPSJ SIG Technical Report. 2010; 2010(21):HPC-128.
  • Wu T, Wang B, Shan Y, Yan F, Wang Y, Xu N. Efficient PageRank and SpMV computation on AMD GPUs. IEEE International Conference on Parallel Processing; San Diego. 2010 Sep. p. 81–9.
  • Article title. 2015. Available from: http://docs.nvidia.com/cuda/cuda-c-programming-guide
  • Bhatia S, Tolpadi M, Rasool A. Importance of GPGPUs in efficiency improvement of real world applications. IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS); Bhopal. 2014 Mar. p. 1–6.
  • Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: Bringing Order to the Web. Stanford University: Technical report; 1999.
  • Article title. 2015. Available from: https://snap.stanford.edu/data

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.