Duplicate Detection Using Scalable and Progressive Approaches

R. Ashok Kumar, B. Raviteja


With the ever increasing volume of knowledge, information quality issues abound. Multiple, nonetheless totally different representations of identical real-world objects in information, duplicates, square measure one among the foremost intriguing information quality issues. The consequences of such duplicates square measure detrimental; as an example, bank customers will get duplicate identities; inventory levels square measure monitored incorrectly, catalogs square measure mail-clad multiple times to identical unit, etc. Duplicate detection is that the method of recognizing multiple representations of same world entities. Today, duplicate detection strategies necessitate to method ever larger datasets in ever lesser time: maintaining the standard of a dataset becomes more and more tough. We tend to represent 2 novel, progressive duplicate detection algorithms that considerably increase the potency of finding duplicates if the execution time is limited: They maximize the gain of the general method inside the time offered by coverage most results a lot of sooner than ancient approaches. Comprehensive experiments show that our progressive algorithms will double the potency over time of ancient duplicate detection and considerably improve upon connected work.

Full Text:


Copyright (c) 2017 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 

Paper submission: ijr@pen2print.org