Offline & Online Semantic Web Crawler

Nikita Suryavanshi, Deeksha Singh, Ms. Sunakashi

Abstract


Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages for indexing and analysis. Such a web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. In addition,

I/O performance, network resources, and OS limits must be taken into account in order to achieve high performance at a reasonable cost.

In this paper, we describe the design and implementation of a distributed web crawler that runs on a network of workstations.

The crawler scales to (at least) several hundred pages per second, is resilient against system crashes and other events, and can be adapted to various crawling applications.

We present the software architecture of the system, discuss the performance bottlenecks, and describe efficient techniques for achieving high performance. We also report preliminary experimental results based on a crawl of 120 million pages on 5 million hosts.

            Search engine come to our rescue in such cases .with a search engine ,all the students has to do is type in the “keyword” relating to the information that he needs .The search engine would then return a set of results that match best with the keywords entered.

            A Web search engine can therefore be defined as a software program at takes input from the user, searches its database and returns a set of results .It is important to note here that the search engine does not search the internet: rather it searches its  database ,which is populated with data from the internet by its crawler .Therefore ,we chose to develop web search engine and the ranking method to arrange the pages found by search engine relevantly. So that the user who entered the query can find the most relevant page first (page which consist of relevant information required by user) .Our project has a feature called “Page Rank & Hits” that allows user to receive most relevant result in response to a query .For instance if user enters a keyword “student” as his query the pages consisting relevant information will be searched and then ranked according to their hit ratio and the page rank.

Keyword- ASP.NET; Visual Studio 2008; Visual Management Studio; My SQL


Full Text:

PDF




Copyright (c) 2016 Nikita Suryavanshi, Deeksha Singh, Ms. Sunakashi

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

All published Articles are Open Access at  https://journals.pen2print.org/index.php/ijr/ 


Paper submission: ijr@pen2print.org