Total views : 843

Experimental Setup of Logs Analysis on Distributed File Systems using MapReduce

Affiliations

  • Department of Computer Science, Vivekanand Institute of Education Society’s Arts, Science and Commerce College, Mumbai – 400071, Maharashtra, India
  • Department of Computer Science, HVPM, Amravati – 444605, Maharashtra, India

Abstract


The computing world is undergoing a drastic change from traditional non-centralized distributed system architecture to typical parallel and pseudodistributed nodes. Such nodes are scattered across different geographic areas to a centralized cloud computing architecture where data transformation and computations are operated somewhere on any node. Data centres owned and maintained by third party or a cloud can be formed and maintained using the number of physical machines. These machines can be of different configurations or using virtual machines on a shared LAN to communicate with each other. It has been experienced that there is always a difference in performance when the MapReduce program is run on various input statements and different Distributed File System (DFS).

The use case on data generation from the Security Logs from the server machine has been taken into consideration. In our case to run this program, the mini-cloud has been configured on LAN. The outcome of analysis has been carried out using a MapReduce program, tested on the data generated from the security software, have been tested on various DFS like Hadoop, Ceph, Glusterfs and the Zfs. These DFS installed on infrastructures like Single Virtual Machine, a cluster of Virtual Machine and the minicloud. It has been noticed that MapReduce is the best technique for the logs analysis and computations.


Keywords

Ceph, Gluster, Hadoop, Logs, MapReduce.

Full Text:

 |  (PDF views: 505)

References


  • Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Proceedings of OSDI ‘04: 6th Symposium on Operating System Design and Implementation; San Francisco. 2004.
  • Apache Hadoop. See website at: Crossref 3. Sacerdoti FD. Performance and fault tolerance in the storetorrent parallel file system. arXivPublications; 2010. p. 1–13. PMCid: PMC3092554.
  • Savitha K, Vijaya MS. Mining of web server logs in a distributed cluster using big data technologies. International Journal of Advanced Computer Science and Applications (IJACSA). 2014; 5(1):137–42.
  • Zaharia M, Konwinski A, Joseph AD, Katz RH, Stoica I. Improving MapReduce performance in heterogeneous environments. Osdi. 2008 Dec; 8(4):7. PMCid: PMC2627897.
  • Savant PD, Bhattacharyya D, Kim TH. Hadoop based Weblog Analysis: A Review. International Journal of Software Engineering and its Applications. 2016; 10(6):13–30. Crossref
  • Wang CH, Tsai CT, Fan CC, Yuan SM. A Hadoop based weblog analysis system. IEEE 2014 7th International Conference on Ubi-Media Computing and Workshops (UMEDIA); 2014 Jul. p. 72–7. Crossref
  • Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop Distributed File System. 2010 IEEE 26th symposium on Mass Storage Systems and Technologies (MSST); 2010 May. p. 1–10. Crossref
  • Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM. 2008; 51(1):107–13. Crossref
  • Source – An Introduction to Gluster Architecture. Cloud Storage for the Modern DataCentre. 2011.
  • Source - Crossref
  • Rodeh O, Teperman A. zFS-a scalable distributed file system using object disks. Proceedings of 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, (MSST 2003); 2003 Apr. p. 207–18.
  • Teperman A, Weit A. Improving performance of a distributed file system using OSDs and cooperative cache. IBM Journal of Research and Development. 2004.
  • Source - Crossref
  • Weil SA, Brandt SA, Miller EL, Long DD, Maltzahn C. Ceph: A scalable, high-performance distributed file system. Proceedings of the 7th Symposium on Operating Systems Design and Implementation, USENIX Association; 2006 Nov. p. 307–20.
  • Maltzahn C, Molina-Estolano E, Khurana A, Nelson AJ, Brandt SA, Weil S. (2010). Ceph as a scalable alternative to the hadoop distributed file system. login: The USENIX Magazine. 2010; 35:38–49.
  • Weil SA, Brandt SA, Miller EL, Maltzahn C. CRUSH: Controlled, scalable, decentralized placement of replicated data. Proceedings of the 2006 ACM/IEEE Conference on Supercomputing ACM; 2006 Nov. p. 122. Crossref
  • Source - Crossref
  • Glusterfs troubleshooting - Crossref
  • MapReduceTutorial - Crossref
  • Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM. 2008; 51(1):107–13. Crossref

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.