Tweet Segmentation and Its Application to Named Entity Recognition (NER)

Kundeti Solman Raju, I. Vinay, Samrat Krishna

Abstract


Twitter has pulled in a great many clients to share and spread most breakthrough data, bringing about huge volumes of information created ordinary. Notwithstanding, numerous applications in Information Retrieval (IR) and Natural Language Processing (NLP) experience the ill effects of the loud and short nature of tweets. In this paper, we propose a novel structure for tweet division in a bunch mode, called HybridSeg. By part tweets into significant fragments, the semantic or setting data is all around protected and effortlessly removed by the downstream applications. HybridSeg finds the ideal division of a tweet by boosting the whole of the stickiness scores of its competitor fragments. The stickiness score considers the likelihood of a fragment being an expression in English (i.e., worldwide setting) and the likelihood of a section being an expression inside the cluster of tweets (i.e., nearby setting). For

 

the last said, we propose and evaluate two models to decide neighborhood setting by considering the phonetic features and term-dependence in a cluster of tweets, independently. HybridSeg is furthermore proposed to iteratively pick up from specific pieces as pseudo feedback. Tests on two tweet educational lists exhibit that tweet division quality is in a general sense upgraded by learning both worldwide and neighborhood settings differentiated and using overall setting alone. Through examination and relationship, we exhibit that close-by semantic features are more strong for learning neighborhood setting differentiated and term-dependence. As an application, we exhibit that high precision is proficient in named substance affirmation by applying piece based syntactic frame (POS) marking.


Full Text:

PDF




Copyright (c) 2018 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org