Cover image for Ferret

Book description

With the introduction of Ferret, Ruby users now have one of the fastest and most flexible search libraries available. And it's surprisingly easy to use.

This book will show you how to quickly get up and running with Ferret. You'll learn how to index different document types such as PDF, Microsoft Word, and HTML, as well as how to deal with foreign languages and different character encodings. Ferret describes the Ferret Query Language in detail along with the object-oriented approach to building queries.

You will also be introduced to sorting, filtering, and highlighting your search results, with an explanation of exactly how you need to set up your index to perform these tasks. You will also learn how to optimize a Ferret index for lightning fast indexing and split-second query results.

Table of Contents

  1. Ferret
    1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
    2. Preface
      1. Conventions Used in This Book
      2. Using Code Examples
      3. Safari® Enabled
      4. How to Contact Us
    3. 1. Getting Started
      1. Installing Ferret
      2. A Quick Example: Indexing the Filesystem
      3. Summary
    4. 2. Indexing
      1. Index Storage
      2. Documents, Fields, and Boosts
        1. Documents
        2. Fields
        3. Boosts
      3. Setting Up the Index
        1. FieldInfo
          1. :store
          2. :index
          3. :term_vector
        2. FieldInfos
      4. Basic Indexing Operations
        1. Add
        2. Get
        3. Delete
        4. Update
      5. Indexing Non-String Datatypes
        1. Number Fields
        2. Date Fields
        3. Sort Fields
      6. Summary
    5. 3. Advanced Indexing
      1. How the Indexing Process Works
      2. Tuning Indexing Performance
        1. In-Memory Indexing
        2. Indexing Parameters
          1. :max_buffer_memory and :chunk_size
          2. :merge_factor
          3. :max_buffered_docs
          4. :max_merged_docs
          5. :max_field_length
          6. :use_compound_file
          7. :index_skip_interval
          8. :doc_skip_interval
          9. Indexing parameter testing
        3. Parallel Indexing
      3. Optimizing the Index
      4. Index Locking and Concurrency Issues
        1. Multithreaded Environment
        2. Multiprocess Environment
      5. Summary
    6. 4. Search
      1. Overview of Searching Classes
        1. IndexSearcher
        2. Query
        3. QueryParser
        4. Filter
        5. Sort
      2. Building Queries
        1. TermQuery
        2. BooleanQuery
        3. PhraseQuery
        4. RangeQuery
        5. MultiTermQuery
        6. PrefixQuery
        7. WildcardQuery
        8. FuzzyQuery
        9. MatchAllQuery
        10. ConstantScoreQuery
        11. FilteredQuery
        12. Span Queries
          1. SpanTermQuery
          2. SpanFirstQuery
          3. SpanOrQuery
          4. SpanNotQuery
          5. SpanNearQuery
        13. Boosting Queries
      3. QueryParser
        1. Setting Up the QueryParser
        2. Ferret Query Language
          1. TermQuery
          2. BooleanQuery
          3. PhraseQuery
          4. RangeQuery
          5. WildcardQuery
          6. FuzzyQuery
          7. Boosting a query in FQL
      4. Filtering Search Results
        1. Using the RangeFilter
        2. Using the QueryFilter
        3. Writing Your Own Filter
        4. :filter_proc, the New Filter
      5. Sorting Search Results
        1. SortField
        2. Sort
        3. Sorting by Date
      6. Highlighting Query Results
      7. Summary
    7. 5. Analysis
      1. Token
      2. TokenStream
        1. Tokenizer
          1. WhiteSpaceTokenizer
          2. LetterTokenizer
          3. StandardTokenizer
          4. RegExpTokenizer
        2. TokenFilter
          1. LowerCaseFilter
          2. StopFilter
          3. StemFilter
          4. HyphenFilter
      3. Analyzer
        1. StandardAnalyzer
        2. PerFieldAnalyzer
      4. Custom Analysis
    8. 6. Ferret in Practice
      1. Indexing Multiple Document Types
        1. TextReader
        2. HtmlReader
        3. OOoReader (OpenOffice.org Reader)
        4. JpegReader
        5. Mp3Reader
        6. PdfReader
      2. Other Indexing Improvements
      3. Search Improvements
      4. Putting It All Together
      5. Summary
    9. Index
    10. About the Author
    11. Colophon
    12. SPECIAL OFFER: Upgrade this ebook with O’Reilly