The underlying principle
The underlying principle is very simple. Suppose you have some text files that you want to be able to search. All the text contained in these files would be tokenized (broken into words and atomic units if you will) and a list (called an index
) of these words is created that has all these words in a sorted order along with their position and location in each file.
So suppose that we have three files like this:
ages.txt The age of Internet is upon us. tech.txt Technology changes often in age of Internet. goal.txt To build a technology product.
Now all these files would be tokenized and each word (called a term
) would have its corresponding location like this:
age [ages.txt:5, tech.txt:27] build [goal.txt: 3] change [tech.txt: ...
Get Mastering Google App Engine now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.