You are previewing Apache Solr 4 Cookbook.
O'Reilly logo
Apache Solr 4 Cookbook

Book Description

Apache Soir 4 can transform the effectiveness of your search engines and this book will show you how. Jump straight into the hands-on recipes and get a fast understanding of the latest and greatest in open source search.

  • Learn how to make Apache Solr search faster, more complete, and comprehensively scalable

  • Solve performance, setup, configuration, analysis, and query problems in no time

  • Get to grips with, and master, the new exciting features of Apache Solr 4

  • In Detail

    Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features.

    "Apache Solr 4 Cookbook" will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data.

    "Apache Solr 4 Cookbook" will make your search better, more accurate and faster with practical recipes on essential topics such as SolrCloud, querying data, search faceting, text and data analysis, and cache configuration.

    With numerous practical chapters centered on important Solr techniques and methods, Apache Solr 4 Cookbook is an essential resource for developers who wish to take their knowledge and skills further. Thoroughly updated and improved, this Cookbook also covers the changes in Apache Solr 4 including the awesome capabilities of SolrCloud.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Apache Solr 4 Cookbook
      1. Table of Contents
      2. Apache Solr 4 Cookbook
      3. Credits
      4. About the Author
      5. Acknowledgement
      6. About the Reviewers
      7. www.PacktPub.com
        1. Support files, eBooks, discount offers and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      8. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      9. 1. Apache Solr Configuration
        1. Introduction
        2. Running Solr on Jetty
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. I want Jetty to run on a different port
            2. Buffer size is too small
        3. Running Solr on Apache Tomcat
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Changing the port on which we see Solr running on Tomcat
        4. Installing a standalone ZooKeeper
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Clustering your data
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Choosing the right directory implementation
          1. How to do it...
          2. How it works...
        7. Configuring spellchecker to not use its own index
          1. How to do it...
          2. How it works...
          3. There's more...
            1. More than one spellchecker
        8. Solr cache configuration
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Using a filter cache with faceting
            2. When we have no cache hits
            3. When we have more "puts" than "gets"
            4. Filter cache
            5. Query result cache
            6. Document cache
            7. Query result window
          5. See also
        9. How to fetch and index web pages
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Multiple thread crawling
          5. See also
        10. How to set up the extracting request handler
          1. How to do it...
          2. How it works...
          3. See also
        11. Changing the default similarity implementation
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Changing the global similarity
      10. 2. Indexing Your Data
        1. Introduction
        2. Indexing PDF files
          1. Getting ready
          2. How to do it...
          3. How it works...
        3. Generating unique fields automatically
          1. How to do it...
          2. How it works...
        4. Extracting metadata from binary files
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. How to properly configure Data Import Handler with JDBC
          1. How to do it...
          2. How it works...
          3. There's more...
        6. Indexing data from a database using Data Import Handler
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. How to import data using Data Import Handler and delta query
          1. Getting ready
          2. How to do it...
          3. How it works...
        8. How to use Data Import Handler with the URL data source
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. How to modify data while importing with Data Import Handler
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        10. Updating a single field of your document
          1. How to do it...
          2. How it works...
        11. Handling multiple currencies
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Setting up your own currency provider
        12. Detecting the document's language
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Language identification based on Apache Tika
        13. Optimizing your primary key field indexing
          1. How to do it...
          2. How it works...
      11. 3. Analyzing Your Text Data
        1. Introduction
        2. Storing additional information using payloads
          1. How to do it...
          2. How it works...
        3. Eliminating XML and HTML tags from text
          1. How to do it...
          2. How it works...
        4. Copying the contents of one field to another
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Copying contents of dynamic fields to one field
            2. Limiting the number of characters copied
        5. Changing words to other words
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Equivalent synonyms setup
        6. Splitting text by CamelCase
          1. How to do it...
          2. How it works...
        7. Splitting text by whitespace only
          1. How to do it...
          2. How it works...
        8. Making plural words singular without stemming
          1. How to do it...
          2. How it works...
          3. There's more...
        9. Lowercasing the whole string
          1. How to do it...
          2. How it works...
        10. Storing geographical points in the index
          1. How to do it...
          2. How it works...
        11. Stemming your data
          1. How to do it...
          2. How it works...
          3. There's more...
        12. Preparing text to perform an efficient trailing wildcard search
          1. How to do it...
          2. How it works...
          3. There's more...
          4. See also
        13. Splitting text by numbers and non-whitespace characters
          1. How to do it...
          2. How it works...
          3. There's more...
          4. See also
        14. Using Hunspell as a stemmer
          1. Getting ready
          2. How to do it...
          3. How it works...
        15. Using your own stemming dictionary
          1. Getting ready
          2. How it works...
        16. Protecting words from being stemmed
          1. Getting started
          2. How to do it...
          3. How it works...
      12. 4. Querying Solr
        1. Introduction
        2. Asking for a particular field value
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Querying for a particular value using the DisMax query parser
            2. Querying for multiple values in the same field
        3. Sorting results by a field value
          1. How to do it...
          2. How it works...
        4. How to search for a phrase, not a single word
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Defining the distance between words in a phrase
        5. Boosting phrases over words
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        6. Positioning some documents over others on a query
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Excluding documents with QueryElevationComponent
          4. See also
        7. Positioning documents with words closer to each other first
          1. How to do it...
          2. How it works...
        8. Sorting results by a distance from a point
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Getting documents with only a partial match
          1. Getting ready
          2. How to do it...
          3. How it works...
        10. Affecting scoring with functions
          1. How to do it...
          2. How it works...
          3. See also
        11. Nesting queries
          1. How to do it...
          2. How it works...
        12. Modifying returned documents
          1. How to do it...
          2. How it works...
        13. Using parent-child relationships
          1. How to do it...
          2. How it works...
        14. Ignoring typos in terms of performance
          1. How to do it...
          2. How it works...
        15. Detecting and omitting duplicate documents
          1. How to do it...
          2. How it works...
        16. Using field aliases
          1. How to do it...
          2. How it works...
        17. Returning a value of a function in the results
          1. Getting ready
          2. How to do it...
          3. How it works...
      13. 5. Using the Faceting Mechanism
        1. Introduction
        2. Getting the number of documents with the same field value
          1. How to do it...
          2. How it works...
          3. There's more...
        3. Getting the number of documents with the same value range
          1. How to do it...
          2. How it works...
        4. Getting the number of documents matching the query and subquery
          1. How to do it...
          2. How it works...
        5. Removing filters from faceting results
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Sorting faceting results in alphabetical order
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. Implementing the autosuggest feature using faceting
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Suggesting words not whole phrases
        8. Getting the number of documents that don't have a value in the field
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. Having two different facet limits for two different fields in the same query
          1. Getting ready
          2. How to do it...
          3. How it works...
        10. Using decision tree faceting
          1. How to do it...
          2. How it works...
        11. Calculating faceting for relevant documents in groups
          1. Getting ready
          2. How to do it...
          3. How it works...
      14. 6. Improving Solr Performance
        1. Introduction
        2. Paging your results quickly
          1. How to do it...
          2. How it works...
        3. Configuring the document cache
          1. How to do it...
          2. How it works...
        4. Configuring the query result cache
          1. How to do it...
          2. How it works...
        5. Configuring the filter cache
          1. How to do it...
          2. How it works...
        6. Improving Solr performance right after the startup or commit operation
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Improving Solr performance after commit operations
        7. Caching whole result pages
          1. Getting ready
          2. How to do it...
          3. How it works...
        8. Improving faceting performance for low cardinality fields
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Specifying faceting method per field
        9. What to do when Solr slows down during indexing
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Commit after a set amount of documents
            2. Commit within a set amount of time
        10. Analyzing query performance
          1. How to do it...
          2. How it works...
        11. Avoiding filter caching
          1. How to do it...
          2. How it works...
        12. Controlling the order of execution of filter queries
          1. Getting ready
          2. How to do it...
          3. How it works...
        13. Improving the performance of numerical range queries
          1. How to do it...
          2. How it works...
      15. 7. In the Cloud
        1. Introduction
        2. Creating a new SolrCloud cluster
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Starting the embedded ZooKeeper server
        3. Setting up two collections inside a single cluster
          1. Getting ready
          2. How to do it...
          3. How it works...
        4. Managing your SolrCloud cluster
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Understanding the SolrCloud cluster administration GUI
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Distributed indexing and searching
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. Increasing the number of replicas on an already live cluster
          1. Getting ready
          2. How to do it...
          3. How it works...
        8. Stopping automatic document distribution among shards
          1. Getting ready
          2. How to do it...
          3. How it works...
      16. 8. Using Additional Solr Functionalities
        1. Introduction
        2. Getting more documents similar to those returned in the results list
          1. How to do it...
          2. How it works...
        3. Highlighting matched words
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Specifying the fields for highlighting
            2. Changing the default HTML tags that surround the matched word
        4. How to highlight long text fields and get good performance
          1. How to do it...
          2. How it works...
        5. Sorting results by a function value
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Searching words by how they sound
          1. How to do it...
          2. How it works...
          3. See also
        7. Ignoring defined words
          1. How to do it...
          2. How it works...
        8. Computing statistics for the search results
          1. How to do it...
          2. How it works...
        9. Checking the user's spelling mistakes
          1. Getting ready
          2. How to do it...
          3. How it works...
        10. Using field values to group results
          1. How to do it...
          2. How it works...
          3. There's more...
            1. More than a single document in a group
        11. Using queries to group results
          1. Getting ready
          2. How to do it…
          3. How it works...
        12. Using function queries to group results
          1. Getting ready
          2. How to do it...
          3. How it works...
      17. 9. Dealing with Problems
        1. Introduction
        2. How to deal with too many opened files
          1. How to do it...
          2. How it works...
        3. How to deal with out-of-memory problems
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Monitoring heap when an out-of-memory error occurs
            2. Reducing the amount of memory needed by Solr
        4. How to sort non-English languages properly
          1. How to do it...
          2. How it works...
        5. How to make your index smaller
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Estimating your index size and memory usage
        6. Diagnosing Solr problems
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. How to avoid swapping
          1. Getting ready
          2. How to do it...
          3. How it works...
      18. A. Real-life Situations
        1. Introduction
        2. How to implement a product's autocomplete functionality
          1. How to do it...
          2. How it works...
        3. How to implement a category's autocomplete functionality
          1. How to do it...
          2. How it works...
        4. How to use different query parsers in a single query
          1. How to do it...
          2. How it works...
        5. How to get documents right after they were sent for indexation
          1. How to do it...
          2. How it works...
        6. How to search your data in a near real-time manner
          1. How to do it...
          2. How it works...
        7. How to get the documents with all the query words to the top of the results set
          1. How to do it...
          2. How it works...
        8. How to boost documents based on their publishing date
          1. How to do it...
          2. How it works...
          3. There's more...
      19. Index