Total views : 254
ANN Models and their Implications in Content Extraction
Objectives: Internet is the repository of information, which contains enormous information about the past, present which can be used to predict future. To know the unknown users are inclined towards searching the internet rather than referencing the library because of ease of availability. This requirement initiates the need to find the content of a web page with in shortest period of time irrespective of the form the page is. So information and content extraction need to be at a basic generic level and easier to implement without depending on any major software. Methods: The study aims on extraction of information from the available data after the data is digitized. The digitized data is converted to pixel- maps which are universal. The pixel map will not face the issues of the form and the format of the web page content. Statistical method is incorporated to extract the attributes of the images so that issues of language hence text-script and format do not pose problems, the extracted features are presented to the Back Propagation algorithm. Findings: The accuracy is presented and how the content extraction within certain bounds could be possible Tested using unstructured word sets chosen from web pages. The method is demonstrated for mono lingual, multi-lingual and transliterated documents so that the applicability is universal. Applications/Improvement: The method is generic, uses pixel-maps of the data which is software and language independent.
Back Propagation, Content Extraction, Information, Statistical, Deterministic.
- Abusalah M, Tait J, Oakes M. Literature review of cross language information retrieval. World Academy of Science, Engineering and Technology. 2005; 4:175–7.
- Charulatha BS, Rodrigues P, Chitralekha T. Automatic and adaptive clusters for information extraction. International Conference on Soft Computing and Machine Intelligence, New Delhi: India; 2014. p. 60–3.
- Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman A. Heterogeneous clustering. International Conference in Information Communication and Embedded Systems ICICES; 2014.
- Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman A. Mining ambiguities using pixel-based content extraction. Proceedings of the International Conference on Soft Computing Systems; 2015 Dec 8. p. 537–44. DOI: 10.1007/978-81-322-2674-1_50.
- Asfia M, Pedram MM, Rahmani AM. Main content extraction from detailed web pages. International Journal of Computer Applications. 2010; 4(11):18–21.
- Sirsat S. Extraction of core contents from web pages. International Journal of Engineering Trends and Technology (IJETT). 2014; 8(9):6.
- Arias J, Deschacht K, Marie-Francine M. Language independent content extraction from web pages, 9th DutchBelgain information retrieval workshop, Enschede: The Netherlands; 2009.
- Xiaoxia S, Jixian Z, Zhengjun L. A comparison of object-oriented and pixel-based classification approachs using quickbird imagery. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Science. 2010; XXXVIII, Part 8, 1–3.
- Devi HK. Thresholding: A pixel-level image processing methodology preprocessing technique for an OCR System for the Brahmi Script. Journal of the Society of South Asian Archeology; 2006.
- Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman A. Clustering for knowledgeable web mining. A springer International Conference on Advances in Intelligent Systems and Computing, ICAEES; 2014. p. 491–8.
- Mitchell T. Machine learning. Tata McGraw-Hill Education India; 2013 May 01.
- Prakash KB, Rangaswamy MAD, Ananthan TV, Rajavarman VN. Information extraction in unstructured multilingual web documents. Indian Journal of Science and Technology. 2015; 8(16):1–8. DOI: 10.17485/ijst/2015/v8i16/54252.
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.