Total views : 242

Evaluation of CloudRS Algorithm with De Novo Assemblers


  • SRM University, Kancheepuram _603203, Chennai, Tamil Nadu, India


Objectives: The paper documents a comparative analytical study of the two prominent De Novo Algorithms (DNA) namely Velvet and SSAKE, both are being pipelined by CloudRS. Methods/Statistical Analysis: The Research process conducted in this project primarily utilized Next-Generation Sequencing data results. These data sets were further error corrected by pipelining them with CloudRS. Upon error correction, the data sets were assembled separately by VELVET and SSAKE; the data from the analysis were then analyzed as per the mathematical results produced in order to statistically compare the two algorithms for a similar environment. Findings: On assembling the error corrected genome, the data produced sets of values. These values were tabulated and noted in order to ensure effective comparison. The values being compared were the N50 and corrected lengths of the assembled genes. The general genome analysis comparison metrics were then utilized to compare the documented data. This showed that a higher N50 value with a better assembled error corrected length read ensured more effectiveness of an algorithm. This result allowed for the first comparison between two prominent DNA algorithms, which hadn’t been compared before, to ensure better understanding Applications/Improvements: The applications of these results are endless, primarily, to ensure that work which involves assembled genome reads proceed with the utmost effectiveness. Any further improved algorithms, if created down the line, can aid in improving the entire process of the same. Thus, in the uniqueness of the results lies the novelty of the entire project.


De Novo assembly, De Bruijn graphs, ReadStack algorithm, Map Reduce, Hadoop, ALLPATHS-LG, CloudRS, VELVET, SSAKE, comparison.

Full Text:

 |  (PDF views: 210)


  • Tipu HN, Shabbir A. Evolution of DNA sequencing, J Coll. Phys. Surg. Pak. 2015; 25(4):210–15.
  • Mardis ER. Next-Generation DNA Sequencing Methods. 2008; 9:387−402.
  • Gnerre S, Mac Callum I, Przybylski D, Ribeiro FJ ,Burton JN , Walker BJ, Sharpe T, Hall G Shea TP, Sykes S, Berlin AMD, Aird M, Costello R, Daza L, Williams R, Nicol A, Gnirke C, Nusbaum ES, Lander DB, Jaffe J. High-Quality Draft Assemblies of Mammalian Genomes from Massively Parallel Sequence Data. 2008; 108(4):1513−18.
  • Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters, 2008 Commun. ACM. 2008; 51(1):107−13.
  • Welcome to Apache TM Hadoop TM. Data accessed: 22/06/2016. Available at:
  • Irudayasamy A, Arockiam L. Parallel Bottom-up Generalization Approach for Data Anonymization using Map Reduce for Security of Data in Public Cloud, Indian Journal of Science and Technology. 2015 Sep; 8(22):1−9.
  • Chung WC, Chang YJ, Lee DT, Ho JM. Using Geometric Structures to Improve the Error Correction Algorithm of High-Throughput Sequencing Data on MapReduce Framework, IEEE International Conference on Big Data. 2014, p. 784−89.
  • Kopka H, Daly PW. Velvet: Algorithms for De Novo Short Read Assembly using De Bruijngraphs, Cold Spring Harbor Laboratory Press, 2008.
  • Warren RL, Sutton GG, Jones SJM, Holt RA. Assembling Millions of Short DNA Sequences using SSAKE. 2007; 4(1): 500–01.
  • Kopka H, Daly PW. Correcting Errors in Short Reads by Multiple Alignments, Cold Spring Harbor Laboratory Press. 2008; 27(11):1455−61.
  • Bradnam KR. Assemblathon 2: Evaluating De Novo Methods of Genome Assembly in three Vertebrate Species, GigaScience. 2013; 23(2):10.
  • Chen CC, Chang YJ, Chung WC, Lee DL, Ho JM. CloudRS: An Error Correction Algorithm of High-Throughput Sequencing Data Based on Scalable Framework, IEEE International Conference on Big Data, 2013, p. 717−22.
  • Priyadharshini V, Malathi A, Analysis of Process Mining Model for Software Reliability Dataset using HMM, Indian Journal of Science and Technology. 2016 Jan; 9(4):1−5.
  • N50. Date accessed: 26/2008. Available at:


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.