Documents, Fields, and Boosts

Documents

The best way to think of an index is as a searchable array of documents. A Ferret document is a collection of fields representing a chunk of data that you want to make searchable. Whether that chunk of data is a database row, a Word document, or an MP3 file doesn’t matter. They are all just documents to Ferret. A Ferret document can be represented by the Ferret::Document class. This class extends Ruby’s Hash class, adding only a boost attribute. In fact, as you saw in Example 1-2, documents can also be Hashes, where the key is the name of the field and the value is the data stored in the field.

The term “document” can be quite confusing. We often need to talk about the idea of a document in an index that is implemented by the Document class. A document can represent a PDF or a text document, or it can represent something like a movie or a product. Make note of the formatting we use to distinguish documents from the Document class.

Earlier we mentioned that Documents have a boost attribute, but we didn’t say what boost was for. The boost attribute gives a document a higher weighting in the results of a search. By using the boost attribute, you can make more important documents ...

Get Ferret now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.