Cleaning Framework For Bigdata An Interactive Approach For Data Cleaning

lenkala Revathi Reddy, swathy P

Abstract


Data is a valuable resource. Proper use of high-quality data can help people make better predictions, analyses and decisions. However, no matter how much effort we put into collecting a good dataset, errors will inevitably creep into the data, making it necessary for data cleaning. This becomes a concern particularly when large-scale heterogeneous data from multiple sources are integrated for other purposes. Data cleaning can be complicated, time consuming, and expensive, but it is a necessary step in any data-related system since poor-quality data may not be suitable to achieve the intended purposes. The core of our data cleaning system is data association and repairing. Association aims to identify the same object and link with the most associated objects, and repairing is to make a database reliable by fixing errors in the data. For big data applications, we don't necessarily need to use all the data. In most situations, we only need a small subset of the most relevant data. So the goal of association is to convert big raw data into a small subset of the most relevant data that are most useful for a particular application. After we obtain a small amount of relevant data, we also need to further analyze the data to help people digest the data and turn the data into knowledge. We use a number of techniques to associate the data to get useful knowledge for data repairing. Our research shows that data association can effectively help with data repairing. To capture the interaction, we provide a uniform framework that unifies the association and repairing process seamlessly based on context patterns, usage patterns, metadata, and repairing rules.






Copyright (c) 2017 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org