You are previewing Apache Solr 4 Cookbook.

Apache Solr 4 Cookbook

Cover of Apache Solr 4 Cookbook by Rafal Kuc' Published by Packt Publishing
  1. Apache Solr 4 Cookbook
    1. Table of Contents
    2. Apache Solr 4 Cookbook
    3. Credits
    4. About the Author
    5. Acknowledgement
    6. About the Reviewers
      1. Support files, eBooks, discount offers and more
    8. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
    9. 1. Apache Solr Configuration
      1. Introduction
      2. Running Solr on Jetty
      3. Running Solr on Apache Tomcat
      4. Installing a standalone ZooKeeper
      5. Clustering your data
      6. Choosing the right directory implementation
      7. Configuring spellchecker to not use its own index
      8. Solr cache configuration
      9. How to fetch and index web pages
      10. How to set up the extracting request handler
      11. Changing the default similarity implementation
    10. 2. Indexing Your Data
      1. Introduction
      2. Indexing PDF files
      3. Generating unique fields automatically
      4. Extracting metadata from binary files
      5. How to properly configure Data Import Handler with JDBC
      6. Indexing data from a database using Data Import Handler
      7. How to import data using Data Import Handler and delta query
      8. How to use Data Import Handler with the URL data source
      9. How to modify data while importing with Data Import Handler
      10. Updating a single field of your document
      11. Handling multiple currencies
      12. Detecting the document's language
      13. Optimizing your primary key field indexing
    11. 3. Analyzing Your Text Data
      1. Introduction
      2. Storing additional information using payloads
      3. Eliminating XML and HTML tags from text
      4. Copying the contents of one field to another
      5. Changing words to other words
      6. Splitting text by CamelCase
      7. Splitting text by whitespace only
      8. Making plural words singular without stemming
      9. Lowercasing the whole string
      10. Storing geographical points in the index
      11. Stemming your data
      12. Preparing text to perform an efficient trailing wildcard search
      13. Splitting text by numbers and non-whitespace characters
      14. Using Hunspell as a stemmer
      15. Using your own stemming dictionary
      16. Protecting words from being stemmed
    12. 4. Querying Solr
      1. Introduction
      2. Asking for a particular field value
      3. Sorting results by a field value
      4. How to search for a phrase, not a single word
      5. Boosting phrases over words
      6. Positioning some documents over others on a query
      7. Positioning documents with words closer to each other first
      8. Sorting results by a distance from a point
      9. Getting documents with only a partial match
      10. Affecting scoring with functions
      11. Nesting queries
      12. Modifying returned documents
      13. Using parent-child relationships
      14. Ignoring typos in terms of performance
      15. Detecting and omitting duplicate documents
      16. Using field aliases
      17. Returning a value of a function in the results
    13. 5. Using the Faceting Mechanism
      1. Introduction
      2. Getting the number of documents with the same field value
      3. Getting the number of documents with the same value range
      4. Getting the number of documents matching the query and subquery
      5. Removing filters from faceting results
      6. Sorting faceting results in alphabetical order
      7. Implementing the autosuggest feature using faceting
      8. Getting the number of documents that don't have a value in the field
      9. Having two different facet limits for two different fields in the same query
      10. Using decision tree faceting
      11. Calculating faceting for relevant documents in groups
    14. 6. Improving Solr Performance
      1. Introduction
      2. Paging your results quickly
      3. Configuring the document cache
      4. Configuring the query result cache
      5. Configuring the filter cache
      6. Improving Solr performance right after the startup or commit operation
      7. Caching whole result pages
      8. Improving faceting performance for low cardinality fields
      9. What to do when Solr slows down during indexing
      10. Analyzing query performance
      11. Avoiding filter caching
      12. Controlling the order of execution of filter queries
      13. Improving the performance of numerical range queries
    15. 7. In the Cloud
      1. Introduction
      2. Creating a new SolrCloud cluster
      3. Setting up two collections inside a single cluster
      4. Managing your SolrCloud cluster
      5. Understanding the SolrCloud cluster administration GUI
      6. Distributed indexing and searching
      7. Increasing the number of replicas on an already live cluster
      8. Stopping automatic document distribution among shards
    16. 8. Using Additional Solr Functionalities
      1. Introduction
      2. Getting more documents similar to those returned in the results list
      3. Highlighting matched words
      4. How to highlight long text fields and get good performance
      5. Sorting results by a function value
      6. Searching words by how they sound
      7. Ignoring defined words
      8. Computing statistics for the search results
      9. Checking the user's spelling mistakes
      10. Using field values to group results
      11. Using queries to group results
      12. Using function queries to group results
    17. 9. Dealing with Problems
      1. Introduction
      2. How to deal with too many opened files
      3. How to deal with out-of-memory problems
      4. How to sort non-English languages properly
      5. How to make your index smaller
      6. Diagnosing Solr problems
      7. How to avoid swapping
    18. A. Real-life Situations
      1. Introduction
      2. How to implement a product's autocomplete functionality
      3. How to implement a category's autocomplete functionality
      4. How to use different query parsers in a single query
      5. How to get documents right after they were sent for indexation
      6. How to search your data in a near real-time manner
      7. How to get the documents with all the query words to the top of the results set
      8. How to boost documents based on their publishing date
    19. Index

Configuring spellchecker to not use its own index

If you are used to the way spellchecker worked in the previous Solr versions, you may remember that it required its own index to give you spelling corrections. That approach had some disadvantages, such as the need for rebuilding the index, and replication between master and slave servers. With the Solr Version 4.0, a new spellchecker implementation was introduced – solr.DirectSolrSpellchecker. It allowed you to use your main index to provide spelling suggestions and didn't need to be rebuilt after every commit. So now, let's see how to use that new spellchecker implementation in Solr.

How to do it...

First of all, let's assume we have a field in the index called title, in which we hold titles of ...

The best content for your career. Discover unlimited learning on demand for around $1/day.