Reading the PDF from the Google guys was very enlightening.
The first Page Rank algorithm was very good at taking a “still” image of the WWW and organizing it in a way that is becomes a graph with many characteristics that are desirable in order to navigate through it and reach a final state of weighting (ranking) after finite iterations.
If we think about this from a mathematical point of view, it is great. But from a real content point of view, it is not that good, it is not nearly a good still image, an id is reducing the vast space of content into small results that can be manipulated (as any human system can be).
what are its shortcomings?
It has been proven that commercial Interest Manipulation has taken advantage of the Back link property of the Page rank algorithm. But I think that this manipulation will always be possible, it is a matter of how relative information is as a source of “truth”. This has happens from the beginning of information being transmitted by a few to many.
It is up to the person willing to use it to seek other resources mixed with it. The algorithm gives true results, and truth is always relative, relative to the side that is telling it, so this can become another very deep discussion about truth, but the bottom line could be that the truth is what the majority thinks it is, but that doesn’t mean that all, every single one has to believe it and live in it, so the best thing that can exist is diversity and in this way Google falls short, it is not its fault, it is just one algorithm.
Another problem I do not like, is that they mention that a small Data Base could be made in order to narrow the initial search, and that if the desired information is not there, then move on to the whole web. I do not like this, because it would be like trying to find information based on what is available on your neighborhood block, you would not have the chance to reach more variable points of views or more quality information that that the one already familiar to you.
The race to be in the first page of results, is good in a sense that can produce a flow of ideas and even economy, but this kind of competition, will always bring ideas and behaviors that are not very good as fair play.
Improvements on the algorithm…Having seen some of the improvements that have been made from the original algorithm, it is kind of difficult to add to that from my perspective, one improvement regarding the IR on the positive comments that people make on the wiki/comments sections of certain pages, specially commercial pages. This can also be manipulated. I have signed up on a site called shorttask.com, in which you can get paid to write good reviews about a certain subject on certain sites, so this is also subject for manipulation.
So I guess improvements could be made in the way the results are presented. The idea of mixing best results from the first page with results from buried pages (like the 10th page) is a good one. But they do not have to be mixed; they can be showed in a different design, maybe parallel to the first page, sorted so that they can be visualized at the same level of hierarchy of the #1 page.
Another good thing would be to mix results with other types of algorithms, other types of search engines, in the same visualization. Google could provide different types of services for different types of searches, to be activated on a personalized computer, that way you could to the search and see results differently according to how you have configured your Google search account. You could edit your seeds for example and allow searches to be made on other engines at the same time.
Google is a type of window very useful for some kind of searches. It is not the only window, and it would be absurd to pretend to have a perfect one. That is why other search engines exists, like stumble upon, where you get different kinds of results, when you are interested in a different type of search because you are in a different type of mood.