A Study of Comparison between Different Types of Document Clustering Techniques in Data Mining

G Venkanna, Syed Thayyab Hussain

Abstract


This paper displays the aftereffects of a trial investigation of some basic document clustering methods. Specifically, we contrast the two primary methodologies with document grouping, agglomerative Hierarchical clustering and K-means. (For K-means we utilized a "standard" K-means calculation and a variation of K-means, "bisecting" K-means.) Hierarchical grouping is regularly depicted as the better quality clustering approach, yet is constrained due to its quadratic time multifaceted nature. Conversely, K-means and its variations have a period multifaceted nature which is straight in the quantity of documents, however are thought to create mediocre clusters. Now and then K-means and agglomerative Hierarchical methodologies are joined in order to "defeat both universes." In any case, our outcomes show that the bisecting K-means system is superior to anything the standard K-means approach and in the same class as or superior to anything the progressive methodologies that we tried for an assortment of group assessment measurements. We propose a clarification for these outcomes that depends on an examination of the specifics of the clustering calculations and the way of document information


Full Text:

PDF




Copyright (c) 2017 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org