You are previewing Apache Solr 3.1 Cookbook.
O'Reilly logo
Apache Solr 3.1 Cookbook

Book Description

Over 110 highly effective recipes to turbo-charge the user interface of any web-enabled Internet application and web page

  • Improve the way in which you work with Apache Solr to make your search engine quicker and more effective

  • Deal with performance, setup, and configuration problems in no time

  • Discover little-known Solr functionalities and create your own modules to customize Solr to your company's need

  • Part of Packt's Cookbook series; each chapter covers a different aspect of working with Solr

  • In Detail

    Apache Solr is a fast, scalable, modern, open source, and easy-to-use search engine. It allows you to develop a professional search engine for your ecommerce site, web application, or back office software. Setting up Solr is easy, but configuring it to get the most out of your site is the difficult bit.

    The Solr 3.1 Cookbook will make your everyday work easier by using real-life examples that show you how to deal with the most common problems that can arise while using the Apache Solr search engine. Why waste your time searching the Internet for solutions when you can have all the answers in one place?

    This cookbook will show you how to get the most out of your search engine. Each chapter covers a different aspect of working with Solr from analyzing your text data through querying, performance improvement, and developing your own modules. The practical recipes will help you to quickly solve common problems with data analysis, show you how to use faceting to collect data and to speed up the performance of Solr. You will learn about functionalities that most newbies are unaware of, such as sorting results by a function value, highlighting matched words, and computing statistics to make your work with Solr easy and stress free.

    This practical guide shows you how to get the most out of Apache Solr 3.1 with recipes that show you how to improve your search engine's performance, analyze data quickly and efficiently, and customize the search server with your own modules.

    Table of Contents

    1. Apache Solr 3.1 Cookbook
      1. Table of Contents
      2. Apache Solr 3.1 Cookbook
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Apache Solr Configuration
        1. Introduction
        2. Running Solr on Jetty
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. I want Jetty to run on a different port
            2. Buffer size is too small
        3. Running Solr on Apache Tomcat
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Changing the port on which we see Solr running on Tomcat
        4. Using the Suggester component
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Suggestions from a static dictionary
            2. Rebuilding the suggestion word base after commit
            3. Removing uncommon words from suggestions
          4. See also
        5. Handling multiple languages in a single index
          1. How to do it...
          2. How it works...
          3. See also
        6. Indexing fields in a dynamic way
          1. How to do it...
          2. How it works...
          3. See also
        7. Making multilingual data searchable with multicore deployment
          1. How to do it...
          2. How it works...
          3. There's more...
            1. More information about core admin interface
          4. See also
        8. Solr cache configuration
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. Filter cache
          5. Query result cache
            1. Document cache
          6. Query result window
          7. There's more...
            1. Using filter cache with faceting
            2. When we have no cache hits
            3. When we have more "puts" than "gets"
          8. See also
        9. How to fetch and index web pages
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Multiple thread crawling
          5. See also
        10. Getting the most relevant results with early query termination
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. How to set up Extracting Request Handler
          1. How to do it...
          2. How it works...
          3. See also
      9. 2. Indexing your Data
        1. Introduction
        2. Indexing data in CSV format
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Splitting encapsulated data
        3. Indexing data in XML format
          1. How to do it...
          2. How it works...
        4. Indexing data in JSON format
          1. How to do it...
          2. How it works...
        5. Indexing PDF files
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Indexing Microsoft Office files
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Extracting metadata from binary files
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. How to properly configure Data Import Handler with JDBC
          1. How to do it...
          2. How it works...
          3. There's more...
            1. How to change the default behavior of deleting index contents at the beginning of a full import
        9. Indexing data from a database using Data Import Handler
          1. Getting ready
          2. How to do it...
          3. How it works...
        10. How to import data using Data Import Handler and delta query
          1. Getting ready
          2. How to do it...
          3. How it works...
        11. How to use Data Import Handler with URL Data Source
          1. Getting ready
          2. How to do it...
          3. How it works...
        12. How to modify data while importing with Data Import Handler
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Using scripts other than JavaScript
      10. 3. Analyzing your Text Data
        1. Introduction
        2. Storing additional information using payloads
          1. How to do it...
          2. How it works...
        3. Eliminating XML and HTML tags from the text
          1. How to do it...
          2. How it works...
        4. Copying the contents of one field to another
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Copying the contents of dynamic fields to one field
            2. Limiting the number of characters copied
        5. Changing words to other words
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Equivalent synonyms setup
        6. Splitting text by camel case
          1. How to do it...
          2. How it works...
        7. Splitting text by whitespace only
          1. How to do it...
          2. How it works...
        8. Making plural words singular, but without stemming
          1. How to do it...
          2. How it works...
          3. There's more...
        9. Lowercasing the whole string
          1. How to do it...
          2. How it works...
        10. Storing geographical points in the index
          1. How to do it...
          2. How it works...
          3. There's more...
        11. Stemming your data
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Alternative English stemmer
            2. Stemming other languages
        12. Preparing text to do efficient trailing wildcard search
          1. How to do it...
          2. How it works...
          3. There's more...
          4. See also
        13. Splitting text by numbers and non-white space characters
          1. How to do it...
          2. How it works...
          3. There's more...
          4. See also
      11. 4. Solr Administration
        1. Introduction
        2. Monitoring Solr via JMX
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Connecting to an existing JMX agent
            2. Connecting to an existing MBean server
            3. Running a remote JXM server
        3. How to check the cache status
          1. How to do it...
          2. How it works...
          3. See also
        4. How to check how the data type or field behave
          1. How to do it...
          2. How it works...
        5. How to check Solr query handler usage
          1. How to do it...
          2. How it works...
        6. How to check Solr update handler usage
          1. How to do it...
          2. How it works...
        7. How to change Solr instance logging configuration
          1. How to do it...
          2. How it works...
        8. How to check the Java based replication status
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. How to check the script based replication status
          1. Getting ready
          2. How to do it...
          3. How it works...
        10. Setting up a Java based index replication
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Slave and HTTP Basic authorization
            2. Changing the configuration file names when replicating
          4. See also
        11. Setting up script based replication
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        12. How to manage Java based replication status using HTTP commands
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Aborting index fetching
            2. Disabling replication
            3. Enabling replication
        13. How to analyze your index structure
          1. How to do it...
          2. How it works...
      12. 5. Querying Solr
        1. Introduction
        2. Asking for a particular field value
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Querying for a particular value using dismax query parser
            2. Querying for multiple values in the same field
        3. Sorting results by a field value
          1. How to do it...
          2. How it works...
        4. Choosing a different query parser
          1. How to do it...
          2. How it works...
        5. How to search for a phrase, not a single word
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Defining the distance between words in a phrase
        6. Boosting phrases over words
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Boosting phrases with standard query parser
        7. Positioning some documents over others on a query
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Excluding documents with QueryElevationComponent
        8. Positioning documents with words closer to each other first
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Phrase boosting using standard query parser
        9. Sorting results by a distance from a point
          1. How to do it...
          2. How it works...
          3. See also
        10. Getting documents with only a partial match
          1. Getting ready
          2. How to do it...
          3. How it works...
        11. Affecting scoring with function
          1. How to do it...
          2. How it works...
          3. See also
        12. Nesting queries
          1. How to do it...
          2. How it works...
      13. 6. Using Faceting Mechanism
        1. Introduction
        2. Getting the number of documents with the same field value
          1. How to do it...
          2. How it works...
          3. There's more...
            1. How to show facets with counts greater than zero
            2. Lexicographical sorting of the faceting results
        3. Getting the number of documents with the same date range
          1. How to do it...
          2. How it works...
          3. See also
        4. Getting the number of documents with the same value range
          1. How to do it...
          2. How it works...
        5. Getting the number of documents matching the query and sub query
          1. How to do it...
          2. How it works...
        6. How to remove filters from faceting results
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. How to name different faceting results
          1. How to do it...
          2. How it works...
        8. How to sort faceting results in an alphabetical order
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Choosing the sort order in Solr earlier than 1.4
        9. How to implement the autosuggest feature using faceting
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Suggesting words, not whole phrases
        10. How to get the number of documents that don't have a value in the field
          1. Getting ready
          2. How to do it...
          3. How it works...
        11. How to get all the faceting results, not just the first hundred ones
          1. Getting ready
          2. How to do it...
          3. How it works...
        12. How to have two different facet limits for two different fields in the same query
          1. Getting ready
          2. How to do it...
          3. How it works...
      14. 7. Improving Solr Performance
        1. Introduction
        2. Paging your results quickly
          1. How to do it...
          2. How it works...
        3. Configuring the document cache
          1. How to do it...
          2. How it works...
        4. Configuring the query result cache
          1. How to do it...
          2. How it works...
        5. Configuring the filter cache
          1. How to do it...
          2. How it works...
          3. See also
        6. Improving Solr performance right after the startup or commit operation
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Improving Solr performance after commit operations
        7. Setting up a sharded deployment
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Dealing with queries taking too much time
        8. Caching whole result pages
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. Improving faceting performance
          1. Getting ready
          2. How to do it...
          3. How it works...
        10. What to do when Solr slows down during indexing when using Data Import Handler
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Commit after a set amount of documents
            2. Commit within a set amount of time
        11. Getting the first top documents fast when having millions of them
          1. Getting ready
          2. How to do it...
          3. How it works...
      15. 8. Creating Applications that use Solr and Developing your Own Solr Modules
        1. Introduction
        2. Choosing a different response format than the default one
          1. How to do it...
          2. How it works...
        3. Using Solr with PHP
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Using Solr with Ruby
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Using SolrJ to query Solr
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Developing your own request handler
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. Developing your own filter
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Developing your own search component
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. Developing your own field type
          1. Getting ready
          2. How to do it...
          3. How it works...
      16. 9. Using Additional Solr Functionalities
        1. Introduction
        2. Getting more documents similar to those returned in the results list
          1. How to do it...
          2. How it works...
        3. Presenting search results in a fast and easy way
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Running Solritas on Solr 1.4.1 or 1.4
          4. See also
        4. Highlighting matched words
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Specifying the fields for highlighting
            2. Changing the default HTML tags that surround the matched word
        5. How to highlight long text fields and get good performance
          1. How to do it...
          2. How it works...
        6. Sorting results by a function value
          1. How to do it..
          2. How it works...
        7. Searching words by how they sound
          1. How to do it...
          2. How it works...
          3. See also
        8. Ignoring defined words
          1. How to do it...
          2. How it works...
        9. Computing statistics for the search results
          1. How to do it...
          2. How it works...
        10. Checking user's spelling mistakes
          1. How to do it...
          2. How it works...
          3. See also
        11. Using "group by" like functionalities in Solr
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Fetching more than one document in a group
      17. 10. Dealing with Problems
        1. Introduction
        2. How to deal with a corrupted index
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Checking the index without the repair procedure
        3. How to reduce the number of files the index is made of
          1. How to do it...
          2. How it works...
        4. How to deal with a locked index
          1. How to do it...
          2. How it works...
        5. How to deal with too many opened files
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. How to deal with out of memory problems
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Seeing heap when out of memory error occurs
        7. How to sort non-English languages properly
          1. How to do it...
          2. How it works...
          3. See also
        8. How to deal with the infinite loop exception when using shards
          1. How to do it...
          2. How it works...
        9. How to deal with garbage collection running too long
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Monitoring the garbage collector
          4. See also
        10. How to update only one field in all documents without the need of full indexation
          1. How to do it...
          2. How it works...
        11. How to make your index smaller
          1. How to do it...
          2. How it works...
          3. See also
      18. Index