4.3. Designing the Database

Five database tables are used for the project. The first two store the search engine's configuration options.

+--------------+--------------+------+
| Field        | Type         | Null |
+--------------+--------------+------+
| DOCUMENT_URL | varchar(255) | NO   |
+--------------+--------------+------+

+--------------+--------------+------+
| Field        | Type         | Null |
+--------------+--------------+------+
| TERM_VALUE   | varchar(255) | NO   |
+--------------+--------------+------+

The WROX_SEARCH_CRAWL table collects the addresses for the indexer to retrieve and include in the index. Essentially, this is a list of all files a user may search through using the search engine. WROX_SEARCH_STOP_WORD stores a list of stop words that should not be included in the index. Typically these are articles, pronouns, prepositions, and other words that have little value as search terms.

It would be nice if MySQL exposed its internal list of stop words using a special SHOW or SELECT query because then an INSERT-SELECT statement could pre-populate the table with data. You could then use this list as a starting point to add or filter various words as needed. However, this isn't the case so you must develop your own list of stop words.

The next three tables construct the inverted index to be fed by the indexer and searched by the front end of the search engine. An inverted index is a common data structure used in search algorithms to optimize the speed of the query. The trade-off between this ...

Get PHP and MySQL®: Create-Modify-Reuse now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.