Custom Analysis
The Tokenizers
and Analyzers
that come with Ferret cater to most needs most of the time.
However, there may come a time when Ferretâs standard Analysis
classes fall short and you need to
implement your own. In the following example, weâll show you how to build
an Analyzer
that automatically pads numbers to a
fixed width so that they will be correctly sorted for use
with a RangeQuery
or
RangeFilter
:
module
Ferret::Analysis
class
IntegerTokenizer
def
initialize
(
num
,
width
)
@num
=
num
.
to_i
@width
=
width
end
def
next
token
=
Token
.
new
("
%0
#{@width}
d"
%
@num
,
0
,
@width
)
if
@num
@num
=
nil
return
token
end
def
text=
(
text
)
@num
=
text
.
to_i
end
end
class
IntegerAnalyzer
def
initialize
(
width
)
@width
=
width
end
def
token_stream
(
field
,
input
)
return
IntegerTokenizer
.
new
(
input
,
@width
)
end
end
end
include
Ferret
::
Analysis
analyzer
=
PerFieldAnalyzer
.
new
(
StandardAnalyzer
.
new
)
analyzer
[
:padded
]
=
IntegerAnalyzer
.
new
(
5
)
index
=
Ferret
::
I
.
new
(
:analyzer
=>
analyzer
)
[
5
,
50
,
500
,
5000
,
50000
].
each
do
|
number
|
index
<<
{
:padded
=>
number
,
:unpadded
=>
number
}
end
puts
"
padded:
"
+
index
.
search
('
padded:[10 10000]
').
to_s
puts
"
unpadded:
"
+
index
.
search
('
unpadded:[10 10000]
').
to_s
If you run this example, youâll see that the
RangeQuery
worked on the :padded_num
field, even though you didnât
explicitly pad the numbers as you added them to the field or the numbers
in the RangeQuery
itself. The :num
field query, on the other hand, failed
miserably.
In this chapter, we have covered ...
Get Ferret now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.