A New Approach for Increase the Efficiency of Finding Duplicates

Samanth Kumar Thodupunoori, T. Malathi

Abstract


With the ever growing volume of statistics, information satisfactory troubles abound. Multiple, yet one of a kind representations of the identical actual-world objects in facts, duplicates, are one of the most exciting records quality problems. The results of such duplicates are adverse. For instance, bank customers can gain replica identities, stock tiers are monitored incorrectly, catalogs are mailed a couple of instances to the equal family, and so forth. Automatically detecting duplicates is hard. Duplicate detection is the system for identifying more than one representations of identical actual world entities. Nowadays, replica detection techniques need to manner ever larger datasets in ever shorter time: retaining the pleasant of a dataset turns into increasingly hard. Genetic set of rules is proposed that notably increase the performance of locating duplicates if the execution time is restrained. This successfully detects the text file duplication which has equal content material with distinct file name or specific content material with equal file call.


Keywords


Increase the Efficiency, Finding Duplicates

Full Text:

PDF




Copyright (c) 2017 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org