A Duplicates New Approach for Growth the Performance of Locating

SAMANTH KUMAR THODUPUNOORI; T. MALATHI

A Duplicates New Approach for Growth the Performance of Locating

SAMANTH KUMAR THODUPUNOORI, T. MALATHI

Abstract

With the ever increasing volume of data, data quality problems abound. Multiple, yet different representations of the same real-world objects in data, duplicates, are one of the most intriguing data quality problems. The effects of such duplicates are detrimental. For instance, bank customers can obtain duplicate identities, inventory levels are monitored incorrectly, catalogs are mailed multiple times to the same household, etc. Automatically detecting duplicates is difficult. Duplicate detection is the process for identifying multiple representations of same real world entities. Nowadays, duplicate detection methods need to process ever larger datasets in ever shorter time: maintaining the quality of a dataset becomes increasingly difficult. Genetic algorithm is proposed that significantly increase the efficiency of finding duplicates if the execution time is limited. This efficiently detects the text document duplication which has same content with distinct file name or different content with same file name

Full Text:

PDF

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

All published Articles are Open Access at https://journals.pen2print.org/index.php/ijr/

Paper submission: ijr@pen2print.org

Username
Password
Remember me

International Journal of Research

A Duplicates New Approach for Growth the Performance of Locating

Abstract

Full Text: