search ranking: other strategies

Bug #294475 reported by raj
2
Affects Status Importance Assigned to Milestone
Open Library
New
Medium
Edward Betts

Bug Description

We thought about adding weight for books/authors that have wikipedia pages, and adding weight for books with full text available.

raj (raj-archive)
Changed in openlibrary:
assignee: nobody → solrize
importance: Undecided → Medium
Revision history for this message
Karen Coyle (kcoyle) wrote :

We could add weight to the edition record for:

1. number of fields in the record
2. number of words or bytes in the record
3. presence of table of contents field
4. presence of 'linking' fields (isbn, lccn)
5. presence of subject headings

When we have works, we could add weight for:

6. number of editions of a work
7. allow users to assign 'stars' to a work (weight for stars, and weight for number of users who have assigned stars)
8. number of reviews of the work
9. work has wikipedia entry

Revision history for this message
solrize (solrize) wrote :

I'm not too keen on #1-4 of that list, since it mainly tends to upweight recent books, books whose records came from publisher feeds, and books with long blurbs and sample chapters. There was a time when we were indexing entire contents of ONIX feeds. This included a lot of Danielle Steele novels with sample chapters. The result was that almost every search for any common term got Danielle Steele novels as the top hits. I had to shut off the sample chapters to make that stop.

#6 and #8 seem much more important. #7 and #9 are also important but gameable (remember on Amazon, a huge number of favorable reviews turned out to be written by the book authors). I don't know about #5. I guess it's worth a try.

I want to add one of my favorite quotes about searching for books, from the novel "Cryptonomicon" by Neal Stephenson. Randy wants to explore a sunken shipwreck, but knows nothing about diving, so he looks for some books on the subject. The first ones he looks at run out of info at about one third of the depth he wants to dive to. What I'm trying to get at with the quote is that good search ranking should aim to bring out the books with the best and most authorative information, even when those books are obscure, thus the emphasis on ranking using external data sources like reviews and citations:

"Randy closes up all of the books and looks at them peevishly for a while. They are all nice new books with color photographs on the covers. He picked them off the shelf because (getting introspective here) he is a computer guy, and in the computer world any book printed more than two months ago is a campy nostalgia item. ...

"He concludes that these are all consumer-grade diving books written for rum-drenched tourists, and furthermore that the publishers probably had teams of lawyers go over them one word at a time to make sure there would not be liability trouble. That the contents of these books, therefore, probably represent about one percent of everything that the authors actually know about diving, but that the lawyers have made sure that the authors don’t even -mention- that. ...

"Randy does a sorting procedure on the diving books now: he ignores anything that has color photographs, or that appears to have been published within the last twenty years, or that has any quotes on the back cover containing the words 'stunning', 'superb', 'user-friendly', or, worst of all, 'easy-to-understand'. He looks for old, thick books with worn-out bindings and block-lettered titles like DIVE MANUAL. Anything with angry marginal notes written by Doug Shaftoe gets extra points. ...
"Now all of a sudden he's reading stuff by guys whose names are preceded by naval ranks and succeeded by M.D.s and Ph.D.s and they are going on for dozens of pages about the physics of nitrogen bubble formation in the knee, for example. ... He develops a sophisticated layman’s understanding of dive medicine, which amounts to little because everyone’s body is different—hence the need for each diver to have a completely different dive plan. Randy will need to figure out his body fat percentage before he can even begin marking up his sheet of graph paper."

Revision history for this message
solrize (solrize) wrote :

http://tech.slashdot.org/story/09/06/07/194210/Google-Outlines-the-Role-of-Its-Human-Evaluators

Links to: http://digitaldaily.allthingsd.com/20090603/google-and-the-evolution-of-search-scott-huffman/

The linked article mentions that Google now has something like 10,000 people around the world (mostly college students) trained to evaluate search rankings to tune pagerank.

Revision history for this message
George (george-archive) wrote :

Some somewhat random thoughts on ranking if you like... If no good for you, pls mark as invalid/won't fix.

Changed in openlibrary:
assignee: solrize (solrize) → Edward Betts (edwardbetts)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.