Cover by Leonard Richardson, Lucas Carlson

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

13.5. Indexing Structured Text with Ferret

Problem

You want to perform searches on structured text. For instance, you might want to search just the headline of a news story, or just the body.

Discussion

The Ferret library can tokenize and search structured data. It's a pure Ruby port of Java's Lucene library, and it's available as the ferret gem.

Here's how to create and populate an index with Ferret. I'll create a searchable index of useful Ruby packages, stored as a set of binary files in the ruby_packages/ directory.

	require 'rubygems'
	require 'ferret'

	PACKAGE_INDEX_DIR = 'ruby_packages/'
	Dir.mkdir(PACKAGE_INDEX_DIR) unless File.directory? PACKAGE_INDEX_DIR
	index = 
Ferret::Index::Index.new(:path => PACKAGE_INDEX_DIR,
	                                 :default_search_field => 'name|description')
	index << { :name => 'SimpleSearch',
	           :description => 'A simple indexing library.',
	           :supports_structured_data => false,
	           :complexity => 2 }
	index << { :name => ' 
Ferret',
	           :description => 'A Ruby port of the Lucene library.
	                            More powerful than SimpleSearch',
	           :supports_structured_data => true,
	           :complexity => 5 }

By default, queries against this index will search the "name" and "description" fields, but you can search against any field:

 index.search_each('library') do |doc_id, score| puts index.doc(doc_id).field('name').data end # SimpleSearch # Ferret index.search_each('description:powerful AND supports_structured_data:true') do |doc_id, score| puts index.doc(doc_id).field("name").data end # Ferret index.search_each("complexity:<5") ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required