An Evaluation of Short Text Similarity Matching from Text Pairs

Rangam sreelatha, V. Priya Darshini

Abstract


Retrieving semantic similar short texts is a crucial issue to many applications.  Cosine similarity coefficient, a pace that's generally found in clustering, measures the similarity between groups. Jaro-Winkler approach to use Cosine's similarity co-efficient increases time complexity greatly. Hence Cosine's similarity coefficient is replaced with Jaro Winkler similarity measure to obtain the cluster similarity matching. Jaro-Winkler does a better job at working the similarity of strings because it takes order of characters into account using positional indexes to estimate relevancy. It is presumed that Jaro-Winkler performance regarding one-to-many data linkages offers an enhanced performance in contrast to Cosine driven CACT's workings. So we propose to replace Cosine's similarity coefficient with Jaro Winkler similarity measure to obtain the similarity matching of text pairs. For Similarity-matching, we evaluated the performance of Jaro-Winkler, CACT's, WordNet-based and Wikipedia-based. After using different model to test these similarity metrics, we found that Jaro-Winkler performed better than CACT's, WordNet-based and Wikipedia-based. First, we explored record linkage similarity metrics to determine which are suitable for predicting Short text. In current scenario,


Full Text:

PDF




Copyright (c) 2017 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org