Saturday, 28 February 2015

Review 5.1: A Simple Database Schema for Search Engine

A Simple Database Schema for Search Engine

There are already a set of ranking algorithms for search engines. Before starting to study them, I firstly need to prepare a database for the purpose of testing. The schema for this database is as the following pic.

The first table is 'urllist'. 'urllist' stores all the urls that have been indexed by the crawler. The second table (wordlist) has all the words separated from those indexed urls. The third table (wordlocation) indicates whether a webpage of a url contains the word, and what is the location of the word in the webpage. The remaining two tables specify links between webpages. The table (link) stores pairs of URL IDs, indicating a link from one webpage to another, and table (linkwords) uses wordid and linkid columns to store which words actually used in a link.




No comments:

Post a Comment