Total views : 222

Comparing and Analyzing the Characteristics of Hadoop, Cassandra and Quantcast File Systems for Handling Big Data

Affiliations

  • Department of Computer Science & Engineering, Faculty of Engineering and Technology, Jamia Hamdard, (Hamdard University), New Delhi -110062, India

Abstract


Objective: With the emergence of the notion of “Internet of Things (IoT)”, colossal amount of information is being generated through the sensors and other computing devices and chips. This paper is an attempt to provide a lucid comparison among three prominent technologies used for handling Big Data, viz. Hadoop Distributed File System, Cassandra file system and Quantcast file system. Apart for these three premier file systems, the paper also explores a newly proposed A train Distributed System for handling Big Data. Methods: An inner perspective of the above stated file systems in details considering various aspects for handling big data has been described. The paper also provides sagacity on the situations wherein these technologies are useful. Findings: Effective tackling of the five V’s (Variety, Volume, Velocity, Veracity and Value) of Big Data has become a challenging task for the researcher around the world. Hadoop is one such technology which is open source and is capable of handling big data in an effective manner. It breaks the big data into fixed sized chunks known as block and these blocks are saved at distinct locations in a distributed manner. The Cassandra file system is an alternative to Hadoop which eliminates the single point failure problem of Hadoop as it follows master-less peer to peer distributed ring architecture instead of client server architecture. The third technology is the quantcast file system which is written in C++ language. It also handles the big data in an effective and efficient manner. Moreover it claims to save upto fifty percent of the disk space by implementing erasure encoding. Application: The concerned organization to use any of these available frameworks for handling big data depending upon their nature of needs.

Keywords

A Train, ADS, Cassandra, Hadoop, HDFS, HD Insight, Quantcast

Full Text:

 |  (PDF views: 203)

References


  • Ali A, Singh M. Sams Teach Yourself in 24 Hours Big Data, Analytics with Microsoft HD Insight. Pearson Education; 2016.
  • Eaton C, Deroos D, Deutsch T, Lapis G, Zikopoulos P. Understanding Big Data-Analytics for Enterprise Class Hadoop and Streaming Data. McGrawHill; 2012.
  • Tech America Foundation. Demystifying Big Data: A Practical Guide to Transforming the Business of Government.TechAmerica Reports. 2012. p. 1–40.
  • Elgendy N, Elragal A. Big data analytics: A literature review paper. Advances in data mining. Applications and theoretical aspects. Lecture Notes in Computer Science. 2014; 8557:214-27. Crossref
  • Hilda JJ, Srimathi C, Bonthu B. A review on the development of big data analytics and effective data visualization techniques in the context of massive and multidimensional data. Indian Journal of Science and Technology. 2016 Jul; 9(27):1-13. Crossref
  • Karthick N, Kalarani XA. An improved method for handling and extracting useful information from big data. Indian Journal of Science and Technology. 2015 Dec; 8(33):1-7.Crossref
  • Hashmi AS, Ahmad T. Big data mining techniques. Indian Journal of Science and Technology. 2016 Oct; 9(37):1-5.
  • Ghemawat S, Gobioff H, Leung S. The Google file system.Proceeding of 19th ACM Symposium on Operating Systems Principles (SOSP’03); New York. 2003. Crossref
  • Dwivedi K, Dubey SK. A Taxonomy and Comparison of Hadoop Distributed File System with Cassandra File System.ARPN Journal of Engineering and Applied Sciences.2015 Sep; 10(16):6870-6.
  • Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop distributed file system. IEEE Proceeding of the 26th Symposium on Mass Storage Systems and Technologies (MSST); USA. 2010. p. 1-10. Crossref
  • Borthakur D. HDFS Architecture Guide, Apache Foundation.2016. Available from: https://hadoop.apache.org/ docs/r1.2.1/hdfs_design.pdf
  • Gupta L. HDFS – Hadoop Distributed File System Architecture Tutorial. Available from: http://howtodoinjava.com/big-data/hadoop/hdfs-hadoop-distributed-file-systemarchitecture-tutorial/
  • Cassandra Query Language, Tutorials Point. Available from: http://www.tutorialspoint.com/cassandra/cassandra_ tutorial.pdf
  • Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS), White Paper, By Datastax Corporation. 2016. Available from: https://www.datastax.com/wp-content/uploads/2012/09/WP-DataStaxHDFSvsCFS.pdf
  • Lakshman A, Cassandra MP. A decentralized structured storage system. Proceeding of ACM SIGOPS Operating Systems; USA. 2010. p. 35-40.
  • Dede E, Sendir B, Kuzlu P, Hartog J, Govindaraju M. An Evaluation of Cassandra for Hadoop. IEEE Proceeding of 6th International Conference on Cloud Computing; USA.2013. p. 494-501. doi:10.1109/cloud.2013.31
  • Luciani J, Brisk. Better Hadoop with Cassandra. Available from: http://www.datastax.com/wp-content/uploads/ 2011/07/Brisk_fully_distributed_Hadoop.pdf
  • Ovsiannikov M, Rus S, Reeves D, Sutter P, Rao S, Kelly J. The quantcast file system. Proceedings of the 39th International conference on Very Large Scale Databases (VLDB Endowment); Trento. 2013. p.1092-1101.doi:10.14778/2536222.2536234
  • Ho S, Wu C, Zhou J, Chen W, Hsu C, Hsiao H, Chung Y. Distributed metaserver mechanism and recovery mechanism support in quantcast file system. IEEE Proceeding of 39th Annual Computer Software and Applications Conference (COMPSAC); USA. 2015. p. 758-63. doi:10.1109/ compsac.2015.109
  • Biswas R. Atrain Distributed System (ADS): An infinitely scalable architecture for processing big data of any 4 vs. computational intelligence for big data analysis. Adaptation, Learning, and Optimization. 2015; 19:3-54. Crossref
  • Natkins J. Authorization and Authentication. Hadoop, Cloudera Engineering Blog. Available from: http://blog. cloudera.com/blog/2012/03/authorization-and-authenticationin-hadoop/
  • Chary N, Siddalinga KM, Rahman. Security Implementation in Hadoop. Available from: http://search.iiit.ac.in/ cloud/presentations/28.pdf
  • Patel J. Cassandra Data Modelling Best Practices, Part. Available from: http://www.ebaytechblog.com/2012/08/14/ cassandra-data-modeling-best-practices-part-2/
  • Bailey N. Balancing Your Cassandra Cluster. Available from: http://www.datastax.com/dev/blog/balancing-your-cassandracluster
  • Chauhan A, Fontama V, Hart M, Tok WH, Woody B. Introducing microsoft azure HD insight technical overview. Microsoft Press: A Division of Microsoft Corporation; 2014.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.