## Saturday, 28 February 2015

### Review 5.4: Using Inbound Links - PageRank Algorithm

PageRank algorithm is invented by Larry Page from Google. It caculate the probability that some random randomly clicking on links will arrive a certain page. A user will always stop clicking after a while. To capture this, PageRank algorthm also using a damping factor to indicate that there is a 85% chance that a user will continue clicking on links at that page. The above pic shows how PageRank algorthm works: User other page's pagerank values and proption that links link to a page to caculate the PageRank value of the page.

Then the PageRank value of page A: 0.15 + 0.85 * (0.4/4 + 0.5/5 + 0.2/1) = 0.15 + 0.85 * (0.1+0.1+0.2) = 0.47

A problem is how we know the pagerank of Page B,C,D. A method is to initial the PageRank value of all pages to 1.0. Then we recaculate those pagerank values one page by one page for several iterations, e.g. 20 iterations.

Code snippet:

``` //formula PR(A) = 0.15 + 0.85 * (PR(P1)/Links(P1)+...+PR(Pn)/Links(Pn))
//initial pagerank for every page is 1.0
//0.85 is damping factor(The probility a person continuing to click on a page), 0.15 is a minimun value which is (1-0.85)
//The more iteration, the more accurate the pagerank is. but 20 iterations is enough
def initDataAndCaculatePageRank(iternum:Int = 20) {
initDataForPageRank()
caculatePageRank(iternum)
}

def initDataForPageRank() = {
val sqllist = List("drop table if exists pagerank",
"create table pagerank (urlid primary key, score)",
"insert into pagerank select rowid,1.0 from urllist")
execTran(sqllist)
}

def caculatePageRank(iternum:Int) = {
for( i ← 0 until iternum )
getAllUrlID.foreach(caculatePageRankFor(_))
}

def caculatePageRankFor(urlid:String) = {
var result = 0.15
val dampfactor = 0.85

val fromIdList = query("SELECT fromid FROM link WHERE toid="+urlid) { rs⇒
bmap(rs.next) {
rs.getString("fromid")
}
}

fromIdList.foreach (
fromid⇒{
val score = query("SELECT score FROM pagerank WHERE urlid="+fromid)(rs ⇒ rs.getDouble("score"))
result += dampfactor * (score / links)
}
)
execTran(List("UPDATE pagerank SET score="+result+" WHERE urlid="+urlid))
}

def getPageRankScore(rows:List[List[String]]) = {
normalize(scorelist)
}

def getPageRankScoreFor(urlid:String) = {
query("SELECT score FROM pagerank WHERE urlid="+urlid){rs⇒
rs.getDouble(1)
}
}

```

1. 2. 3. 