Analysis and Performance Evaluation of Large Data Sets Using Hadoop

K V Prasad; A. Abhinay Reddy; A. Shashanka; Ch.Sai kumar chary

Analysis and Performance Evaluation of Large Data Sets Using Hadoop

K V Prasad, A. Abhinay Reddy, A. Shashanka, Ch.Sai kumar chary

Abstract

The Hadoop MapReduce is designed for Distributed and parallel processing to process very large data reliably, and to stream those data at very high bandwidth to user application. Hadoop Distributed File System (HDFS) and Mapreduce paradigm provides a parallelization and distributed processing for its ease-of-use, analyzing unstructured and structured data. This provides scalability, reliable storage, and fault-tolerance. MapReduce specify Map and Reduce functions to make a huge task to parallelize and execute on a large cluster of Commodity machines. Here the immense of data is loaded in HDFS for data processing, evaluating, indexing, and resource utilization. By using the loaded data in hadoop the classification and clustering is done using machine learning algorithm. In data processing all the missing values and relations is identified and performance evaluation has been executed. This system handles identifying error, load balancing, utilizing system resources, less cost and high performance

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

All published Articles are Open Access at https://journals.pen2print.org/index.php/ijr/

Paper submission: ijr@pen2print.org

Username
Password
Remember me

International Journal of Research

Analysis and Performance Evaluation of Large Data Sets Using Hadoop

Abstract