A Survey on Various Indexing Technique for Record Linkage

Pranay Tambekar, Roshan Moharle, Komal Kopare

Abstract


Record linkage is the problem of identifying similar records across different data sources. Record linkage is an important process in data integration, which is used in merging, matching and duplicate removal from several databases that refer to the same entities. De-duplication is the process of removing duplicate records in a single database. In recent years, data cleaning and standardization becomes an important process in data mining task. Removing duplicate records in a single database is a crucial step in the data cleaning process, because duplicates can severely influence the outcomes of any subsequent data processing or data mining. With the increasing size of today’s databases, the complexity of the matching process becomes one of the major challenges for record linkage and de-duplication. This paper presents an analysis of record de-duplication techniques and algorithms that detect and
remove the duplicate records.
Keywords — Data linkage; record linkage; data mining; clustering; classification.

Full Text:

PDF




Copyright (c) 2016 Pranay Tambekar, Roshan Moharle, Komal Kopare

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org