A Duplicates New Approach for Growth the Performance of Locating

SAMANTH KUMAR THODUPUNOORI, T. MALATHI

Abstract


With the ever increasing volume of data, data quality problems abound. Multiple, yet different representations of the same real-world objects in data, duplicates, are one of the most intriguing data quality problems. The effects of such duplicates are detrimental. For instance, bank customers can obtain duplicate identities, inventory levels are monitored incorrectly, catalogs are mailed multiple times to the same household, etc. Automatically detecting duplicates is difficult. Duplicate detection is the process for identifying multiple representations of same real world entities. Nowadays, duplicate detection methods need to process ever larger datasets in ever shorter time: maintaining the quality of a dataset becomes increasingly difficult. Genetic algorithm is proposed that significantly increase the efficiency of finding duplicates if the execution time is limited. This efficiently detects the text document duplication which has same content with distinct file name or different content with same file name


Full Text:

PDF




Copyright (c) 2017 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org