Analysis and Performance Evaluation of Large Data Sets Using Hadoop

K V Prasad, A. Abhinay Reddy, A. Shashanka, Ch.Sai kumar chary

Abstract


The Hadoop MapReduce is designed for Distributed and parallel processing to process very large data reliably, and to stream those data at very high bandwidth to user application. Hadoop Distributed File System (HDFS) and Mapreduce paradigm provides a parallelization and distributed processing for its ease-of-use, analyzing unstructured and structured data. This provides scalability, reliable storage, and fault-tolerance. MapReduce specify Map and Reduce functions to make a huge task to parallelize and execute on a large cluster of Commodity machines. Here the immense of data is loaded in HDFS for data processing, evaluating, indexing, and resource utilization. By using the loaded data in hadoop the classification and clustering is done using machine learning algorithm. In data processing all the missing values and relations is identified and performance evaluation has been executed. This system handles identifying error, load balancing, utilizing system resources, less cost and high performance






Copyright (c) 2017 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org