Total views : 165

A Survey on Uniform Resource Locator and Content Matching to Discover Deep- Web Pages


  • Department of Computer Engineering, RMDSSOE, Pune – 411502, Maharashtra, India


Objectives: 1) The Objective is to harvest the deep web pages efficiently1 2) Personalize search according to user interest. 3) Combine pre-query and post-query approach. Methods/Statistical Analysis: Three methods are used 1. URL Matching: This method is used to match the query content in URL. For that, the system gets a link from online and site database. Links are extracted to match the user entered query content. 2. Content matching: This method is used extracting the links and getting form content and matching the user-entered query .If match calculates the occurrence frequency of that query on the form. For that, it use Jsoup library.3. Pre-query Algorithm: This method used to display pre-query result after entering focus in the search box. For that user login to the system that time system select his profile and according to that links will display to the user. Findings: As in the existing system, most of the search engines display the results according to the most visited sites or recently added sites. To find the deep web pages from the databases is a challenging, because they are not enrolled with any web indexes and keep constantly changing. In this system, Smart Crawler performs URL matching and content matching to discover the deep web pages. Proposed crawler proficiently gets deep-web network from wide destination and accomplishes the higher outcome from different crawlers. Page ranking is performed and it displays high ranked results on the result page .Here it provided personalized search, results display according to the user professions. Maintaining log file and the pre-query result will reduce time. First time this crawler perform personalize search this means that this crawler is unique. During the evaluation, notice that proposed approach is more efficient than the existing crawler. Application/Improvements: The application gathers real-time user profile information from user accounts. Therefore, it must be reliable and keep those data in safe. This crawler is used as the search engine for e-learning application, E-shopping site. Links can be a bookmark for future use. As in improvement, it can rank the pages according to user-entered review for each link. Also, the opened link will display page content in task extraction form. That is code, concept, URL on the page.


Deep Web, IP, Positioning, Smart Crawler

Full Text:

 |  (PDF views: 130)


  • Zhao F, Zhou J, Nie C, Huang H, Jin H. Smart Crawler: A Two-Stage Crawler For Efficiently Harvesting Deep-Web Interfaces. IEEE transactions on services computing. 2016 Jul/Aug; 9(4):608–20. Crossref
  • Gupta S, Bhatia KK. A Comparative Study of Hidden Web Crawlers. International Journal of Computer Trends and Technology (IJCTT). 2014 Jun; 12(3):111–8. Crossref
  • Rahman M. Search Engines going beyond Keyword Search: A Survey. International Journal of Computer Applications by IJCA Journal. 2013 Aug; 75(17):1–8. Crossref
  • Gil1 AB, Rodrguez S, de la Prieta F, De Paz JF. Personalization on E-Content Retrieval based on Semantic Web Services.International Journal of Computer Information Systems and Industrial Management Applications. 2012; 5. ISSN 2150-7988.
  • Chakrabarti S, Den Berg MV, and Dom B. Focused crawling: A new approach to topic-specific web resource discovery. Comput Netw. 1999; 31(11):1623–40. Crossref
  • Shukla V. Improving the Efficiency of Web Crawler by Integrating Pre-Query Approach. International Journal of Innovative Research in Computer and Communication Engineering. 4(1):172–5.
  • Vijayarani S, Ilamathi J, Nithya. Preprocessing Techniques for Text Mining - An Overview. International Journal of Computer Science and Communication Networks. 5(1):7– 16.
  • Kabisch T, Dragut EC, Yu C, and Leser U. Deep web integration with visqi. Proc VLDB Endowment. 2010; 3(1/2):1613–6. Crossref
  • Dragut EC, Kabisch T, Yu C, Leser U. A hierarchical approach to model web query interfaces for web source integration. Proc VLDB Endowment [Online]. 2(1):325– 36. Crossref
  • Li W, Yang C, Yang C. An active crawler for discovering geospatial Web services and their Distribution pattern - A case study of OGC Web Map Service. International Journal of Geographical Information Science. 24(8):1127–47.Crossref
  • Sheng C, Zhang N, Tao Y, Jin X. Optimal algorithms for crawling a hidden database in the web. Proc VLDB Endowment. 2012; 5(11):1112–23. Crossref
  • Chakrabarti S, Den Berg MV, and Dom B. Focused crawling: A new approach to topic-specific web resource discovery. Comput Netw. 1999; 31(11):1623–40. Crossref
  • Wu W, Yu C, Doan A, Meng W. An interactive clustering based approach to integrating source query interfaces on the deep Webroot. ACM SIGMOD Int Conf Manage Data.2004. p. 95–106.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.