Prefetch Yahoo! Search Results

Automatically prefetch and cache the first search result on Yahoo! Web Search.

If you know how to use them properly, search engines are pretty darn good at finding exactly the page you’re looking for. Google is so confident in its algorithm that it includes a hidden attribute in the search results page that tells Firefox to prefetch the first search result and cache it. You’re probably going to click on the first result anyway, and when you do, it will load almost instantaneously because your browser has already been there.

Yahoo! Web Search is pretty good too, but it doesn’t yet have this particular feature. So let’s add it.

Tip

This hack relies on the Greasemonkey extension and thus works only in Firefox. If you’re interested in doing much more with Greasemonkey, see Mark Pilgrim’s forthcoming Greasemonkey Hacks, from which this hack is excerpted.

To begin, you’ll need to install the Greasemonkey plug-in for Firefox. If you don’t already have it, browse to http://greasemonkey.mozdev.org and click the Install Greasemonkey link. Follow the Software Installation prompts and then restart your browser. You’ll know the plug-in is working if you see a small monkey icon in the lower-right corner of Firefox. Once installed, you can move on to analyzing Yahoo! and building the Greasemonkey script.

There are two important things about Yahoo! Search results that you can discover by viewing source on the search results page. First, the links of the search results each have a class yschttl. Yahoo uses this for styling the links with CSS, but you can use it to find the links in the first place. A single XPath query can extract a list of all the links with the class yschttl, and the first one of those is the one we want to prefetch and cache.

The second thing you need to know is that the search results Yahoo! provides are actually redirects through a tracking script on rds.yahoo.com that records which link you clicked on. A sample link looks like this:

	http://rds.yahoo.com/S=2766679/K=gpl+compatible/v=2/SID=e/TID=F510_112/  
	l=WS1/R=2/IPC=us/SHE=0/H=1/SIG=11sgv1lum/EXP=1116517280/*-http%3A//  
	www.gnu.org/licenses/gpl-faq.html

To save time and bandwidth, and to avoid skewing Yahoo’s tracking statistics, this user script will extract the target URL out of the first search result link before requesting it. The target URL is always at the end of the tracking URL, after the *-, with characters such as colons (:) escaped into their hexadecimal equivalents. Here’s the target URL in the previous example:

	http://www.gnu.org/licenses/gpl-faq.html

When I say "prefetch and cache,” there is really only one step: prefetch. By default, Firefox automatically caches pages according to HTTP’s caching directives and your browser preferences. For this script to have the desired effect, make sure your browser preferences are set to enable caching pages. Open a new window or tab, go to about:config, and double-check the following preferences:

 * browser.cache.disk.enable /* should be "true" */
 * browser.cache.check_doc_frequency /* should be 0, 2, or 3 */

Tip

about:config shows you all your browser preferences, even ones that are not normally configurable through the Options dialog. Type part of a preference name (such as browser.cache) in the Filter box to narrow the list of displayed preferences.

The Code

Save the following user script as yahooprefetch.user.js:

	// ==UserScript==
	// @name Yahoo! Prefetcher
	// @namespace http://www.oreilly.com/catalog/greasemonkeyhks/
	// @description prefetch first link on Yahoo! web search results
	// @include http://search.yahoo.com/search*
	// ==/UserScript==

	var elmFirstResult = document.evaluate("//a[@class='yschttl']", document,
        null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue; 
	if (!elmFirstResult) return; 
	var urlFirstResult = unescape(elmFirstResult.href.replace(/^.*\*-/, '')); 
	var oRequest = {
        method: 'GET',
		url: urlFirstResult,
		headers: {'X-Moz': 'prefetch',
              'Referer': location.href}};
        GM_log('prefetching ' + urlFirstResult);
		GM_xmlhttpRequest(oRequest);

Running the Hack

To verify that the script is working properly, you’ll need to clear your browser cache. You don’t need to do this every time, just once to prove to yourself that the script is doing something. To clear your cache, go to the Tools menu and select Options; then go to the Privacy tab and click the Clear button next to Cache.

Now, install the user script from Tools Install User Script, and then go to http://search.yahoo.com and search for gpl compatible. The prefetching happens in the background after the page is fully loaded, so wait for a second or two after the search results come up. There won’t be any visible indication on screen that Firefox is prefetching the link. You might see some additional activity on your modem or network card, but it’s hard to separate this from the activity of loading the rest of the Yahoo! Search results page.

Open a new browser window or tab and go to about:cache. This displays information about Firefox’s browser cache. Under “Disk cache device,” click List Cache Entries. You should see a key for http://www.gnu.org/philosophy/license-list.html. This is the result of Firefox prefetching and caching the first Yahoo! Search result. Click that URL to see specific information about the cache entry, as shown in Figure 1-23.

Information about a prefetched page

Figure 1-23. Information about a prefetched page

Hacking the Hack

By now you should realize that this prefetching technique can be used anywhere, with any links. Do you use some other search engine, perhaps a site-specific search engine such as Microsoft Developer’s Network (MSDN)? You can apply the same technique to those search results.

For example, going to http://msdn.microsoft.com and searching for active accessibility takes you to a search results page at this URL:

	http://search.microsoft.com/search/results.aspx?qu=active+accessibility&  
	View=msdn&st=b&c=0&s=1&swc=0

If you view source on this page, you will see that the result links are contained within a <div class="results“> tag. This means that the first result can be found with this XPath query:

	var elmFirstResult = document.evaluate("//div[@class='results']//  
a[@href]", 
	document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).  
singleNodeValue;

Unlike with Yahoo! Search results, these search result links are not redirected through a tracking script, so you will need to change this line:

	var urlFirstResult = unescape(elmFirstResult.href.replace(/^.*\*-/, ''));

to this:

	var urlFirstResult = elmFirstResult.href;

The rest of the script will work unchanged.

Mark Pilgrim

Get Yahoo! Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.