A Survey on Duplicate Identification using Naive Detection Algorithm

Sanam Siva Rama Raja, Tanuku Pallapu Kiran

Abstract


 The presence of duplicate file is main data quality concern in large databases. To identification duplicates attribute resolution is known as duplication detection. Duplicate detection is process of detecting all cases of multiple method of same real world entity. In this paper propose new  naive detection algorithm using intelligent guesses which records have a high possibility of representing the same real-world entity, the search space is reduced. An implement naive algorithm is used as base line identification generates all possible many objects is stored within the datasets. A new e-mail abstraction scheme is proposed to consider e-mail layout structure to display e-mails. New security a Robust and Collaborative Spam Detection System is used which possesses if efficient near duplicate matching scheme and a progressive update scheme. To maintain data quality and schedule a duplicate detection for all records is match a certain new process. Data clean deleting deactivating or merging the duplicates files by change detection method. The proposed system identification for detection of duplication data by using ranking methods.

Index Terms: Data cleaning, Record linkage, Naive Bayes, HTML, Pay-As-You-Go, data cleaning,

Full Text:

PDF




Copyright (c) 2018 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org