Saturday, 28 February 2015

Review 5.4: Using Inbound Links - PageRank Algorithm


Using Inbound Links: PageRank Algorithm

PageRank algorithm is invented by Larry Page from Google. It caculate the probability that some random randomly clicking on links will arrive a certain page. A user will always stop clicking after a while. To capture this, PageRank algorthm also using a damping factor to indicate that there is a 85% chance that a user will continue clicking on links at that page. 



The above pic shows how PageRank algorthm works: User other page's pagerank values and proption that links link to a page to caculate the PageRank value of the page.

The formula is : PR(A) = (1-0.85) + 0.85 * (PR(B)/Links(B)+PR(C)/Links(C)+PR(D)/Links(D))
Then the PageRank value of page A: 0.15 + 0.85 * (0.4/4 + 0.5/5 + 0.2/1) = 0.15 + 0.85 * (0.1+0.1+0.2) = 0.47

A problem is how we know the pagerank of Page B,C,D. A method is to initial the PageRank value of all pages to 1.0. Then we recaculate those pagerank values one page by one page for several iterations, e.g. 20 iterations.

Code snippet:


 //formula PR(A) = 0.15 + 0.85 * (PR(P1)/Links(P1)+...+PR(Pn)/Links(Pn))
 //initial pagerank for every page is 1.0
 //0.85 is damping factor(The probility a person continuing to click on a page), 0.15 is a minimun value which is (1-0.85)
 //The more iteration, the more accurate the pagerank is. but 20 iterations is enough
 def initDataAndCaculatePageRank(iternum:Int = 20) {
  initDataForPageRank()
  caculatePageRank(iternum)
 }
 
 def initDataForPageRank() = {
  val sqllist = List("drop table if exists pagerank",
      "create table pagerank (urlid primary key, score)",
      "insert into pagerank select rowid,1.0 from urllist") 
  execTran(sqllist)
 }
 
 def caculatePageRank(iternum:Int) = {
  for( i ← 0 until iternum )
   getAllUrlID.foreach(caculatePageRankFor(_))
 }
 
 def caculatePageRankFor(urlid:String) = {
  var result = 0.15
  val dampfactor = 0.85
  
  val fromIdList = query("SELECT fromid FROM link WHERE toid="+urlid) { rs⇒
   bmap(rs.next) {
    rs.getString("fromid")
   }
  }
  
  fromIdList.foreach ( 
   fromid⇒{
    val score = query("SELECT score FROM pagerank WHERE urlid="+fromid)(rs ⇒ rs.getDouble("score"))
    val links = query("SELECT count(*) FROM link WHERE fromid="+fromid)(rs ⇒ rs.getInt(1))
    result += dampfactor * (score / links)
   }
  )
  execTran(List("UPDATE pagerank SET score="+result+" WHERE urlid="+urlid))
 }
 
 def getPageRankScore(rows:List[List[String]]) = {
  val scorelist:Map[String,Double] = rows.map( row⇒Pair(row.head,getPageRankScoreFor(row.head)) ).toMap
  normalize(scorelist)
 }
 
 def getPageRankScoreFor(urlid:String) = {
  query("SELECT score FROM pagerank WHERE urlid="+urlid){rs⇒
   rs.getDouble(1)
  }
 }

1 comment:

  1. I will give you 80 NICHE Relevant Blog Comments safe and effective for your site to Rank on Google and other Search Engine. You get 80 HIgh Quality NICHE Relevent links in a 3 days for only in 5$. 80 Niche Relevant Blog Comments

    ReplyDelete