ANALYSIS OF BIG DATA PROCESSING BY DISTNCT USE OF HADOOP’S MAPREDUCE

I. Geervani, S. Kavya, K. Abdul Hannan, N. Venkatadri

Abstract


Data has become an indispensable part of every economy, industry, organization, business function and individual and such datasets that are beyond the size that traditional databases can handle are termed as Big Data. Hence companies today use a tool called Hadoop. Even sufficiently large amount of data warehouses are unable to satisfy the needs of data storage. Hadoop is designed to store large amount of data sets reliably through HDFS and MapReduce for storing and processing respectively. It is an open source software which supports parallel and distributed data processing. Hadoop also provide fault tolerance mechanism by replication. In this paper, we present introduction to HDFS and MapReduce and survey the performance of sufficiently large dataset processing using MapReduce technique. We propose that performance of the dataset processing can be optimized by leveraging MapReduce in different ways. We analysed the performance of datasets by varying the approach to process it.


Full Text:

PDF




Copyright (c) 2017 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org