A Survey on Various Indexing Technique for Record Linkage

Pranay Tambekar; Roshan Moharle; Komal Kopare

A Survey on Various Indexing Technique for Record Linkage

Pranay Tambekar, Roshan Moharle, Komal Kopare

Abstract

Record linkage is the problem of identifying similar records across different data sources. Record linkage is an important process in data integration, which is used in merging, matching and duplicate removal from several databases that refer to the same entities. De-duplication is the process of removing duplicate records in a single database. In recent years, data cleaning and standardization becomes an important process in data mining task. Removing duplicate records in a single database is a crucial step in the data cleaning process, because duplicates can severely influence the outcomes of any subsequent data processing or data mining. With the increasing size of today’s databases, the complexity of the matching process becomes one of the major challenges for record linkage and de-duplication. This paper presents an analysis of record de-duplication techniques and algorithms that detect and
remove the duplicate records.
Keywords — Data linkage; record linkage; data mining; clustering; classification.

Full Text:

PDF

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

All published Articles are Open Access at https://journals.pen2print.org/index.php/ijr/

Paper submission: ijr@pen2print.org

Username
Password
Remember me

International Journal of Research

A Survey on Various Indexing Technique for Record Linkage

Abstract

Full Text: