Total views : 291

A Study on Some Tasks, Corpus and Resources of Medical Information Retrieval


  • School of Computing Science and Engineering, VIT University, Vellore – 632014, Tamil Nadu, India


Background/Objectives: This paper gives an overview of some tasks involved in the retrieval process, corpus and resources of medical information retrieval. Methods/Statistical Analysis: Inverted file representation method is used in the retrieval process for associating documents in the corpus with various search terms. Conventional statistical ranking functions such as Jaccard, Okapi and Euclidean have been widely used for ranking retrieved medical documents. An extractive informative generic mono-lingual single-document summarizer is used to produce medical domain-specific summary. Sentence ranking method is used to include most appropriate sentences in the final summary. Findings: Studies reveal that people are searching the web and read medical related information in order to be informed about their health. In the medical domain, richest and most used source of information is MEDLINE. Because of frequent use of acronyms in the medical literature, using the term that appears in documents as keywords for document indexing would not be effective. Also, using Bag of Words representation could not capture the semantic meaning of terms. Some domain-specific thesauri like UMLS, MeSH and Gene ontology are available for biomedical retrieval. These domain-specific thesauri can provide synonyms, hypernyms and hyponyms of a specific term but it does not look into the context. Therefore, the retrieval results of using domain-specific thesauri are somewhat conflicting. It is possible to identify which lexical variant of specific term should be used under specific context by using Wikipedia as resource for biomedical retrieval. Conventional ranking functions fail to capture the inherent features of natural language text. Evolutionary algorithm based ranking can enhance the retrieval performance. Any domain-specific summarizer must consider similarity between sentences as essential feature for summarization. Applications/Improvements: Improvements in retrieval results is achieved by using context-aware keywords as indexing keywords and highly robust hybrid evolutionary algorithm based ranking function for ordering the retrieved documents.


Information Retrieval, Medical Information retrieval, Medical Document Corpus, Resources, Retrieval Process.

Full Text:

 |  (PDF views: 308)


  • Singhal A. Modern information retrieval: A brief overview. IEEE Data Engineering Buletin. 2001; 24(4):35–43.
  • Frunza O, Inkpen D, Tran T. A machine learning approach for identifying disease-treatment relations in short texts. IEEE Transactions on Knowledge and Data Engineering. 2011; 23(6):801–14.
  • Swartz K. Health care for the poor: for whom, what care, and whose responsibility? Focus. 2009; 26(2):69–74.
  • Coursey KH, Mihalcea R, Moen WE. Automatic keyword extraction for learning object repositories. Proceedings of the American Society for Information Science and Technology. 2008; 45(1):1–10.
  • Chou S, Chang W, Cheng CY, Jehng JC, Chang C. An information retrieval system for medical records and documents. 30th Annual International Conference of the Engineering in Medicine and Biology Society, Vancouver, British Columbia, Canada. 2008. p. 1474–77.
  • Gupta V, Lehal GS. A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence. 2010; 2(3):258–68.
  • Pimpalshende AN. Overview of text summarization extractive techniques. International Journal of Advanced Technology in Modern Engineering. 2015; 2(12):1–10.
  • Munot N, Govilkar SS. Comparative study of text summarization methods. International Journal of Computer Applications. 2014; 102(12):33–7.
  • Mendoza M, Bonilla S, Noguera C, Cobos C, Leon E. Extractive single-document summarization based on genetic operators and guided local search. Expert Systems with Applications. 2014; 41(9):4158–69.
  • Prakash S, Chakravarthy TC, Brindha GR. Preference based quantified summarization of on-line reviews. Indian Journal of Science and Technology. 2014; 7(11):1788–97.
  • Meghana Ramya Shri J, Subramaniyaswamy V. An effective approach to rank reviews based on relevance by weighting method. Indian Journal of Science and Technology. 2015; 8(11):1–7.
  • Hliaoutakis A, Zervanou K, Petrakis EG. The AMTEx approach in the medical document indexing and retrieval application. Data and Knowledge Engineering. 2009; 68(3):380–92.
  • Paci G, Pedrazzi G, Turra R. Wikipedia-based approach for linking ontology concepts to their realisations in text. International Conference on Language Resources and Evaluation. 2010. p. 33–8.
  • Milne D, Witten IH. An open-source toolkit for mining Wikipedia. Artificial Intelligence. 2013; 194:222–39.
  • Uschold M, Gruninger M, Ontologies: principles, methods and applications. The Knowledge Engineering Review. 1996; 11(2):93–36.
  • Shubhangi CT. An approach to single document text summarization and simplification. IOSR Journal of Computer Engineering. 2014; 16(3):42–9.
  • Milian K, Aleksovski Z, Vdovjak R, Teije AT, Harmelen FV. Identifying disease-centric subdomains in very large medical ontologies: a case-study on breast cancer concepts in SNOMED CT or finding 2500 out of 300.000, Knowledge Representation for Health-Care Data, Processes and Guidelines, Springer-Verlag: Berlin Heidelberg, 2010; 50–63.
  • Dinh D, Tamine L, Boubekeur F. Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies. Artificial Intelligence in Medicine. 2013; 57(2):155–67.
  • Yin X, Huang JX, Li Z, Zhou X. A survival modeling approach to biomedical search result diversification using Wikipedia. IEEE Transactions on Knowledge and Data Engineering. 2013; 25(6):1201–12.
  • Chebil W, Soualmia LF, Darmoni SJ. BioDI: a new approach to improve biomedical documents indexing. Database and Expert Systems Applications, Springer-Verlag: Berlin Heidelberg. 2013; 78–87.
  • Gupta Y, Saini A, Saxena AK. A new fuzzy logic based ranking function for efficient information retrieval system. Expert Systems with Applications. 2015; 42(3):1223–34.
  • Rubens N. The application of fuzzy logic to the construction of the ranking function of information retrieval systems. Computer Modelling and New Technologies. 2006; 10(1):20–7.
  • Wang S, Ma J, He Q. An immune programming-based ranking function discovery approach for effective information retrieval. Expert Systems with Applications. 2010; 37(8):5863–71.
  • Radwan AAA, Abdel Latef BA, Ali AA, Mgeid A, et al. Using genetic algorithm to improve information retrieval systems. World Academy of Science and Engineering Technology. 2006; 17(2):6–12.
  • Zajic DM, Dorr BJ, Lin J. Single-document and multi-document summarization techniques for email threads using sentence compression. Information Processing and Management. 2008; 44(4):1600–10.
  • Goyal P, Behera L, McGinnity TM. A context-based word indexing model for document summarization. IEEE Transactions on Knowledge and Data Engineering. 2013; 25(8):1693–705.
  • Ramanathan K, Sankarasubramaniam Y, Mathur N, Gupta A, Document summarization using Wikipedia. Proceedings of the First International Conference on Intelligent Human Computer Interaction, Allahabad, India. 2009. p. 254–60.
  • Santhana Megala S, Kavitha A, Marimuthu A. Enriching text summarization using fuzzy logic. International Journal of Computer Science and Information Technologies. 2014; 5(1):863–67.
  • Reddy YS, Siva Kumar DA. An efficient approach for web document summarization by sentence ranking. International Journal of Advanced Research in Computer Science and Software Engineering. 2012; 2(7):221–25.
  • Sarkar K, Nasipuri M, Ghose S. Using machine learning for medical document summarization. International Journal of Database Theory and Application. 2011; 4(1):31–48.
  • Hliaoutakis A, Zervanou K, Petrakis EGM, Milios EE. Automatic document indexing in large medical collections. Proceedings of the International Workshop on Healthcare Information and Knowledge Management. 2006; 1–8.
  • Zadeh LA. Fuzzy sets. Information and Control. 1965; 8(3):338–53.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.