A Traffic Minimization Approach for Big Data in Map Reduce Job by Intermediate Data Partition Technique
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Scheduling map tasks to improve data locality is crucial to the performance of MapReduce. Many works have been devoted to increasing data locality for better efficiency. However, to the best of our knowledge, fundamental limits of MapReduce computing clusters with data locality, including the capacity region and theoretical bounds on the delay performance, have not been studied. we propose the on traffic aware partition and aggregation in order to reduce the network cost for map reduce jobs by designing an intermediate data partition scheme. Moreover, we together consider the aggregator placement issue, where each aggregator can reduce merged traffic from more than one map duties. A decomposition-primarily based distributed algorithm is proposed to address the large-scale optimization trouble for a big data application and an online algorithmic rule is also designed to adjust network data partition and aggregation in a dynamic way.
Full Text:
PDFCopyright (c) 2018 Edupedia Publications Pvt Ltd
![Creative Commons License](http://licensebuttons.net/l/by-nc-sa/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
All published Articles are Open Access at https://journals.pen2print.org/index.php/ijr/
Paper submission: ijr@pen2print.org