Indexing Non-String Datatypes

So far, we’ve only really talked about adding strings to the index. As far as Ferret is concerned, every field is a string. But sometimes we want to index other datatypes, such as numbers and dates. We’re going to take a moment to talk about best practices when indexing non-string datatypes, specifically storing special datatypes in their own field. We won’t mention how to handle numbers or dates within a larger string field (like in the string The 39 Steps). You’ll learn more about text-field analysis in Chapter 5.

Number Fields

Indexing number fields is relatively straightforward. You don’t even need to convert them to strings when you add them to the document. However, you do need to think about how you set up the field. Make sure it is untokenized, as some Analyzers will strip all numbers and you’ll end up with an empty field:

index << {:product => "widget", :price => 24.95, :weight => 2400}

The one exception is when you want to run range queries on a number field. For example, you may want to submit a query for all products between $5.00 and $25.00 or for all products that weigh less than 500 grams. In Ferret, the RangeQuery sorts fields lexicographically, so while 200 comes before 500, 70 comes after 500. To fix this, pad the numbers to a fixed width by prepending zeros. So instead of adding 5, 70, and 200, you would add 0005, 0070, and 0200, and instead of adding 3.45 and 101.95, you would add 0003.45 and 0101.95. This is pretty easy ...

Get Ferret now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.