Influence of functional words, term weighting measures and classifiers on Text classification

Dr. P. Vijaya Pal Reddy

Influence of functional words, term weighting measures and classifiers on Text classification

Dr. P. Vijaya Pal Reddy

Abstract

Automated text classification is a supervised learning task which uses labeled training set of documents to assign a category label to a new document based on a model generated by a classifier. The training set and test set documents needs to be preprocessed to reduce the influence of non-content words on the model derived from the training set. In this paper it is attempted to address the influence of non-content words on the classifier performance. After preprocessing the documents are represented in a machine understandable format i.e. vector space model. The terms in the document are weighted using various measures such as Term Frequency-Inverse Document Frequency (TF-IDF ), Residual IDF (RIDF), x^I metric, Odds Ratio (OR(t)), Information Gain (IG(t)) Chi-squared (χ2 (t, c)) and Mutual Information (MI(t)). It is also addressed the influence of different term weighting measures on text classification in news documents. The classification model can be generated using the vector space representation of training documents set with various classifiers. In this paper an attempt is made for classification model generation using the classifiers such as Naive Bayes classifier (NB), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM). The performance of the models generated using these classifiers are measured with precision, recall, F1 and macro F1 measures with various possible combinations of term weighting measures and with functional words.

Full Text:

PDF

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

All published Articles are Open Access at https://journals.pen2print.org/index.php/ijr/

Paper submission: ijr@pen2print.org

Username
Password
Remember me

International Journal of Research

Influence of functional words, term weighting measures and classifiers on Text classification

Abstract

Full Text: