Nearest Keyword Set Search in Multidimensional Datasets



Keyword-based search in text-rich multi-dimensional datasets facilitates many novel applications and tools. In this paper, weconsider objects that are tagged with keywords and are embedded in a vector space. For these datasets, we study queries that ask forthe tightest groups of points satisfying a given set of keywords. We propose a novel method called ProMiSH (Projection and Multi ScaleHashing) that uses random projection and hash-based index structures, and achieves high scalability and speedup. We present anexact and an approximate version of the algorithm. Our experimental results on real and synthetic datasets show that ProMiSH has upto 60 times of speedup over state-of-the-art tree-based techniques.

Full Text:


Copyright (c) 2017 Edupedia Publications Pvt Ltd

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


All published Articles are Open Access at 

Paper submission: