Application of Named Entity wise Recognition and Tweet Segmentation

N. HAVYA, B. NARASIMHA RAO

Abstract


Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes ofdata produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer

severely from the noisy and short nature of tweets. In this paper, we propose a novel framework for tweet segmentation in a batchmode, called HybridSeg. By splitting tweets into meaningful segments, the semantic or context information is well preserved and easilyextracted by the downstream applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the stickinessscores of its candidate segments. The stickiness score considers the probability of a segment being a phrase in English (i.e.,global context) and the probability of a segment being a phrase within the batch of tweets (i.e., local context). For the latter.

we proposeand evaluate two models to derive local context by considering the linguistic features and term-dependency in a batch of tweets,respectively. HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback. Experiments on two tweetdata sets show that tweet segmentation quality is significantly improved by learning both global and local contexts compared with usingglobal context alone. Through analysis and comparison, we show that local linguistic features are more reliable for learning local contextcompared with term-dependency. As an application, we show that high accuracy is achieved in named entity recognition by applyingsegment-based part-of-speech (POS) tagging.


Full Text:

PDF




Copyright (c) 2016 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org