This chapter covers:
Searching with Lucene
Calculating the PageRank vector
Large-scale computing constraints
Let's say that you have a list of documents and you're interested in reading about those that are related to the phrase "Armageddon is near"—or perhaps something less macabre. How would you implement a solution to that problem? A brute force, and naïve, solution would be to read each document and keep only those in which you can find the term "Armageddon is near." You could even count how many times you found each of the words in your search term within each of the documents and sort them according to that count in descending order. That exercise is called information retrieval (IR) or simply searching. Searching isn't ...