A Study of Comparison between Different Types of Document Clustering Techniques in Data Mining
Abstract
This paper displays the aftereffects of a trial investigation of some basic document clustering methods. Specifically, we contrast the two primary methodologies with document grouping, agglomerative Hierarchical clustering and K-means. (For K-means we utilized a "standard" K-means calculation and a variation of K-means, "bisecting" K-means.) Hierarchical grouping is regularly depicted as the better quality clustering approach, yet is constrained due to its quadratic time multifaceted nature. Conversely, K-means and its variations have a period multifaceted nature which is straight in the quantity of documents, however are thought to create mediocre clusters. Now and then K-means and agglomerative Hierarchical methodologies are joined in order to "defeat both universes." In any case, our outcomes show that the bisecting K-means system is superior to anything the standard K-means approach and in the same class as or superior to anything the progressive methodologies that we tried for an assortment of group assessment measurements. We propose a clarification for these outcomes that depends on an examination of the specifics of the clustering calculations and the way of document information
Full Text:
PDFCopyright (c) 2017 Edupedia Publications Pvt Ltd
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
All published Articles are Open Access at https://journals.pen2print.org/index.php/ijr/
Paper submission: ijr@pen2print.org