Productive and Unquestioning Data Duplicate Detection Using REVISE

Alluri Kranthi, G Subrahmanyam

Abstract


Duplicate detection consists in detecting multiple types of representations of a same object, and every object represented in a database source. Duplicate detection is relevant data cleaning and data integration applications and has been studied extensively for relational data describing a single type of object in a single data table.New novel comparison strategy that uses graph model in terms of relationships proposed for hierarchical data. Insteadpairs of objects at any level of the hierarchy are compared in an order that depends on their relationships. We use stringer to evaluate the quality of the clusters obtained many unconstrained clustering algorithms used in concert with approximate join techniques.We present new novel iterative algorithm for duplicate detection system called REVISE. REVISE access to re-examining an object influencing neighbors turn out to be duplicates. The main aim of the project is to detect the duplicate in the structured data. Proposed system focus on a specific type of error namely fuzzy duplicates,The problem of detecting duplicate entities that describe the same real-world object is an important data cleansing task which is important to improve data quality.


Full Text:

PDF




Copyright (c) 2017 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org