Total views : 306

Big Data Analytics with Apache Hadoop MapReduce Framework

Affiliations

  • Department of CSE, K L University, Vaddeswaram - 522502, Andhra Pradesh, India

Abstract


Huge amount of data cannot be handled by conventional database management system. For storing, processing and accessing massive volume of data, which is possible with help of Big data. In this paper we discussed the Hadoop Distributed File System and MapReduce architecture for storing and retrieving information from massive volume of datasets. In this paper we proposed a WordCount application of MapReduce object oriented programming paradigm. It divides input file into splits or tokens that is done with help of java.util.StingTokenizer class. Output file is represented in the form of <<key>, value>. The experimental results are conducted on Hadoop framework by loading large number of input files and evaluating the performance of Hadoop framework with respect to MapReduce object oriented programming paradigm. In this paper we have examined the performance of the map task and the reduce task by loading more number of files and read-write operations that are achieved by these jobs.

Keywords

Hadoop, HDFS, Job Tracker, MapReduce, NameNode.

Full Text:

 |  (PDF views: 347)

References


  • Available from: http://home.web.cern.ch/about/updates/2013/02/cern-data-centre-passes100-petabytes
  • Big Data “has Big Potential to Improve Americans‟ Lives, Increase Economic Opportunities. Committee on Science, Space and Technology. 2013 Apr. Available from: http://science.house.gov/press-release
  • Shvachko K, Kuang H, Radia S, Chansler R, The Hadoop distributed file system. Mass Storage Systems and Technologies (MSST). IEEE 26th Symposium; 2010 May 3-7.
  • Dhamodaran S, Sachin KR, Kumar R. Big Data implementation of natural disaster monitoring and alerting system in real time social network using Hadoop technology. Indian Journal of Science and Technology. 2015 Sep; 8(22). DOI: 10.17485/ijst/2015/v8i22/79102.
  • Gowthami M, Briskilal J, Jayashree R. Scheduling job queue on Hadoop using hybrid Hadoop fair sojourn protocol. Indian Journal of Science and Technology. 2016 Apr; 9(16). DOI: 10.17485/ijst/2016/v9i16/92231.
  • Somu N, Gangaa A, Sriram VSS. Authentication service in Hadoop using one Time Pad. Indian Journal of Science and Technology. 2014 Apr; 7(S4). DOI: 10.17485/ijst/2014/v7i4/50062.
  • Wang X, Yang C, Zhou J. Clustering aggregation by probability accumulation. Pattern Recog. 2009; 42(5):668–75.
  • Greeshma L, Pradeepini G. Mining maximal efficient closed itemsets without any redundancy. Springer Third international Conference on Information Systems Design and Intelligent Applications. 2016 Jan; 433:339–47.
  • Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Journal of Data Mining and Knowledge Discovery. 2004; 8(1):53–87.
  • Yu KM, Zhou J. Parallel TID-based frequent pattern mining algorithm on PC clusters and grid computing system. Expert System with Applications. 2010; 37(3):2486–94.
  • Greeshma L, Pradeepini G. Input split frequent pattern tree using MapReduce paradigm in Hadoop. JATIT. 2016 Feb; 84:260–71.
  • Bikku T, Rao NS, Akepogu AR. Hadoop based feature selection and decision making models on Big Data. Indian Journal of Science and Technology. 2016 Mar; 9(10). DOI: 10.17485/ijst/2016/v9i10/88905.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.