Reducing Fragmentation for In-line De-duplication Backup Storage via Exploiting Backup History and Cache Knowledge
Abstract
In backup systems, the chunks of each backup are physically scattered after de-duplication, which causes a challenging fragmentation problem. We observe that the fragmentation comes into sparse and out-of-order containers. The sparse container decreases restore performance and garbage collection efficiency, while the out-of-order container decreases restore performance if the restore cache is small. In order to reduce the fragmentation, we propose History-Aware Rewriting algorithm (HAR) and Cache-Aware Filter (CAF). HAR exploits historical information in backup systems to accurately identify and reduce sparse containers, and CAF exploits restore cache knowledge to identify the out-of-order containers that hurt restore performance. CAF efficiently complements HAR in datasets where out-of-order containers are dominant. To reduce the metadata overhead of the garbage collection, we further propose a Container-Marker Algorithm (CMA) to identify valid containers instead of valid chunks. Our extensive experimental results from real-world datasets show HAR significantly improves the restore performance by 2.84-175.36at a cost of only rewriting 0.5-2.03% data..
Keywords
De-Duplication , HAR Algorithm, CAF .
Full Text:
PDFCopyright (c) 2018 Edupedia Publications Pvt Ltd
![Creative Commons License](http://licensebuttons.net/l/by-nc-sa/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
All published Articles are Open Access at https://journals.pen2print.org/index.php/ijr/
Paper submission: ijr@pen2print.org