Total views : 461

Obtaining Description for Simple Images using Surface Realization Techniques and Natural Language Processing


  • SASTRA University, Thirumalaisamudram, Tanjore - 613401, Tamil Nadu, India
  • GITAM University, NH 207, Doddaballapur Taluk, Bangalore Rural District, Nagadenehalli, Bangalore – 562163, Karnataka, India


This paper aims at developing a simple mechanism to deduce corpora pertaining to an image through various computer vision and natural language processing techniques. The output of the vision detection is combined with the sentence formation approach to get the visual content in textual form. Vision detections are smoothed using a number of approaches to prune undesired combination of words that are semantically incorrect. Descriptions are generated based on syntactic trees and Markov Chains and compared for human likeness based on survey. The results of the survey indicate that the descriptions generated with the help of Markov Chains sound more human like. These generated descriptions can be indexed in lucene and image search can be made more efficient bridging the semantic gap.


Attributes, Corpora Extraction, Image Detection, Textual Descriptions Generation

Full Text:

 |  (PDF views: 257)


  • Kulkarni G, et al. Baby Talk: Understanding and Generating Simple Image Descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013 Dec; 35(12):28912903.
  • Ordonez V, Kulkarni G and Berg TL. Im2text: Describing images using 1 million captioned photographs. Proc. NIPS 2011. 2011.
  • Krishnamoorthy N, Malkarnenkar G, Mooney R. Generating natural-language video descriptions using textmined knowledge-Procedings of AAAI, 2013.
  • Li S, Kulkarni G, Berg TL, Berg AC and Choi Y. Composing simple image descriptions using web-scale n-grams. Stroudsburg, PA, USA: Association for Computational Linguistics: In Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL ‘11). p. 220-228.
  • Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model. Computer Vision and Pattern Recognition. 2008. CVPR 2008. IEEE Conference on Computer Vision and Pattern Recognition. 2008 June; 1(8):23-28.
  • Farhadi A. Endres I, Hoiem D, Forsyth D. Describing objects by their attributes. IEEE Conference on Computer Vision and Pattern Recognition. 2009 20th to 25th June; p. 1778-85.
  • Li LJ, Su H, Xing AP, Fei-Fei L. Object bank: A high-level image representation for scene classification and semantic feature sparsification. Advances in Neural Information Processing Systems.
  • Mitchell M, Han X, Dodge J, Mensch A, Goyal A, Berg A, Yamaguchi K, Berg T, Stratos K, Daume H III. Midge: generating image descriptions from computer vision detections. Avignon, France: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. 2012 April 23-27.
  • Katare A, Mitra SK, Banerjee, Asim. Content Based Image Retrieval System for Multi Object Images Using Combined Features. ICCTA ’07 International Conference on Computing: Theory and Applications 2007. 2007 March 5-7; 595(599).
  • Ordonez V, Kulkarni G, Berg TL. Im2Text: Describing Images Using 1 Million Captioned Photographs. Neural Information Processing Systems(NIPS). 2011.
  • Benjamin Z Yao, Yang X, Lin L, Lee MW and Zhu S. 2010. Proceedings of IEEE, I2T: Image parsing to text description. 2010; 98(8):1485-1508.
  • Mitchell M, Dunlop A, Roark B. Semi-supervised modeling for prenominal modifier ordering. Portland, Oregon: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers. 2011 June 19-24.
  • Everingham M, Gool LV, Christopher KI Williams, Winn J, Zisserman A. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision. 88(2):303-38.
  • Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - 2005 June 20-26; 1:886-893.
  • Flickr. 2011. Date accessed 1.Sep.11: Available from: http://
  • Michel JB, et al. Quantitative Analysis of Culture Using Millions of Digitized Books. Science, Published online ahead of print. 2010.
  • Marneffe MC, MacCartney B and Christopher D Manning. In LREC 2006: Generating Typed Dependency Parses from Phrase Structure Parses. 2006.
  • Javubar K Sathick, Jaya A. Natural Language to SQL Generation for Semantic Knowledge Extraction in Social Web Sources. Indian Journal of Science and Technology. 2015 Jan; 8(1):1-10.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.