Total views : 196

Knowledge based Approach for English-Malayalam Parallel Corpus Generation


  • Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amrita University, Coimbatore - 641112, Tamil Nadu, India


Objective: This paper aims in providing an overview about a part of Natural Language Generation – Parallel sentence generation which involves the generation of the English sentence as well as its Malayalam translated version. Methods/Analysis: A template based sentence generator approach is followed here. A system is proposed which takes input from a manually created bilingual dictionary and fills the slots in the template for parallel sentence generation. Finding: Using the proposed method, we have generated a total of 25,208 parallel sentences. This can be used in bilingual Machine Translation dictionary. Application/Improvement: In the proposed case use only four templates but by increasing the number of templates and by updating the dictionary, we can increase the size of the parallel corpus that can be generated.


Bilingual, English-Malayalam, Machine Translation, Parallel sentence, Templates.

Full Text:

 |  (PDF views: 191)


  • Chowdhury GG. Natural language processing. Annual Review of Information Science and Technology. 2003; 37(1):51–89.
  • Ehud R. Natural language generation. The Handbook of Computational Linguistics and Natural Language Processing; 2010. p. 574–98.
  • Masud, MAN, Joarder MMM, Tariq-ul-Azam M. A general approach to natural language conversion. 7th IEEE International Multi Topic Conference; 2003.
  • Yazdchi MV, Faili H. Generating English-Persian parallel corpus using an automatic anchor finding sentence aligner. IEEE 2010 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE); 2010.
  • Abinaya N, et al. AMRITA CEN@ FIRE-2014: Named entity recognition for Indian languages using rich features. ACM Proceedings of the Forum for Information Retrieval Evaluation; 2014.
  • Abeera VP, et al. Morphological analyzer for Malayalam using machine learning. Data Engineering and Management. Springer Berlin Heidelberg; 2012. p. 252–4.
  • Kumar MA, Dhanalakshmi V, Soman KP, Rajendran S. Factored statistical machine translation system for English to Tamil language. Pertanika Journal of Social Sciences and Humanities (JSSH). 2014; 22(4).
  • Hiroyuki S, et al. Template-based methods for sentence generation and speech synthesis. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2011.
  • Yadav AK, Borgohain SK. Sentence generation from a bag of words using N-gram model. 2014 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT); 2014.
  • Arefin MS, Sharif MA, Morimoto Y. BEBS: A framework for bilingual summary generation. 2013 IEEE International Conference on Informatics, Electronics and Vision (ICIEV); 2013.
  • Aasha VC, Ganesh A. Machine translation from English to Malayalam using transfer approach. IEEE 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI); 2015.
  • Rauf SA, Schwenk H. Parallel sentence generation from comparable corpora for improved SMT. Machine Translation. 2011; 25(4):341–75.
  • Ding H, Quan L, Qi H. The Chinese-English bilingual sentence alignment based on length. IEEE 2011 International Conference on Asian Language Processing (IALP); 2011.
  • Mikolov T, et al. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems; 2013.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.