Sorting Search Results

By default, documents are sorted by relevance and then by document ID if scores are equal. But what if we want to sort the result set by the value in one of the fields (e.g., price)? One way to do this is to retrieve the entire result set and make use of Ruby’s Array#sort method. However, this would take too long for large result sets, not to mention use up a lot of unnecessary memory. Searcher provides a :sort parameter for easy sorting. The easiest way to specify a sort is to pass a sort string. A sort string is a comma-separated list of field names with an optional DESC modifier to reverse the sort for that field. The type of the field is automatically detected and the field sorted accordingly. So Float fields will be sorted by Float value, and Integer fields will be sorted by Integer value. SCORE and DOC_ID can be used in place of field names to sort by relevance and internal document ID, respectively. Here are some examples:

index.search(query, :sort => "title, year DESC")
index.search(query, :sort => "SCORE DESC, DOC_ID DESC")
index.search(query, :sort => "SCORE, rating DESC")

Although this will do the job most of the time, you can be a little more explicit in describing how a result set is sorted by using the Sort API. You will also need to use the Sort API to take full advantage of sort caching. There are two classes in the Sort API: Sort and SortField.

SortField

A SortField describes how a particular field should be sorted. To create a SortField, you need to supply a field name and a sort type. You can also optionally reverse the sort. Table 4-2 shows the available sort types. Note that sort types are identified by Symbols.

Table 4-2. Sort types

Sort typeDescription
:auto The default type used when we supply a string sort descriptor. Ferret will look at the first term in the field’s index to detect its type. It will sort the field either by integer, float, or string depending on that first term’s type. Be careful when using :auto to sort fields that have numbers in them. If, for example, you are sorting a field with television show titles, “24” would probably be the first term in the index, making Ferret think that the field is an integer field.
:integer Converts every term in the field to an integer and sorts by those integers.
:float Converts every term in the field to a float and sorts by those floats.
:string Performs a locale-sensitive sort on the field. You need to make sure you have your locale set correctly for this to work. If the locale is set to ASCII or ISO-8859-1 and the field is encoded in UTF-8, the field will be incorrectly sorted.
:byte Sorts terms by the order they appear in the index. This will work perfectly for ASCII data and is a lot faster than a string sort.
:doc_id Sorts documents by their internal document ID. For this type of SortField, a field name is not necessary.
:score Sort documents by their relevance. This is how documents are sorted when no sort is specified. For this type of SortField, a field name is not necessary.

The SortField class also has four constant SortField objects:

  • SortField::SCORE

  • SortField::DOC_ID

  • SortField::SCORE_REV

  • SortField::DOC_ID_REV

With these constants available, you generally won’t ever need to create a SortField with the type :score or :doc_id. Here are some examples of how to create SortFields:

title_sort = SortField.new(:title, :type => :string)
path_sort = SortField.new(:path, :type => :byte)
rating_sort = SortField.new(:rating, :type => :float, :reverse => true)

Sort

The Sort object is used to hold SortFields in order of precedence to sort a result set. It is relatively straightforward to use. It also allows you to completely reverse all SortFields in one go (so already reversed fields will be reversed back to normal). Here are a couple of examples:

title_sort = SortField.new(:title, :type => :string)
path_sort = SortField.new(:path, :type => :byte)
rating_sort = SortField.new(:rating, :type => :float, :reverse => true)

sort = Sort.new([title_sort, rating_sort, SortField::SCORE])
top_docs = index.search(query, :sort => sort)

# reverse all sort-fields.
sort = Sort.new([path_sort, SortField::DOC_ID_REV], true)
top_docs = index.search(query, :sort => sort)

The Sort class also has two constants: Sort::RELAVANCE and Sort::INDEX_ORDER. Sort::RELAVANCE will order fields by score as is done by default in Ferret. Sort::INDEX_ORDER sorts a result set to the order in which the documents were added to the index.

Sorting by Date

Possibly one of the most common sorts to perform is a sort by date. We discussed how to store date fields for sorting in the Date Fields” section in Chapter 2. If you have stored the date field correctly (in YYYYMMDD format), it is very simple to sort by this field. The best sort type to use is :byte because it will be the fastest to create the index and otherwise performs just as well as aninteger sort. Using :auto, Ferret will sort the field by integer, which will be fine as well, so it is no problem using the sort string descriptor (e.g., “updated_on, created_on, DESC”). Here is how you would explicitly create a Sort to sort a date field:

updated_on = SortField.new(:updated_on, :type => :byte)
created_on = SortField.new(:created_on, :type => :byte, :reverse => true)
sort = Sort.new([updated_on, created_on, SortField::DOC_ID])
index.search(query, :sort => sort)

Get Ferret now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.