You are previewing Apache Solr High Performance.
O'Reilly logo
Apache Solr High Performance

Book Description

In setting up Apache Solr, you’ll want to ensure it’s achieving optimum search results with maximum efficiency. This book shows you just how to achieve that with a comprehensive tutorial including troubleshooting.

In Detail

Apache Solr is one of the most popular open source search servers available on the web. However, simply setting up Apache Solr is not enough to ensure the success of your web product. To maximize efficiency, you need to use techniques to boost Solr performance in order to return relevant results faster. You need to implement robust techniques that focus on optimizing the performance of your Solr instances and also troubleshoot issues that are prone to arise while maintaining Solr.

Apache Solr High Performance is a practical guide that will help you explore and take full advantage of the robust nature of Apache Solr so as to achieve optimized Solr instances, especially in terms of performance.

You will learn everything you need to know in order to achieve a high performing Solr instance or set of instances, as well as how to troubleshoot the common problems you are prone to face while working with single or multiple Solr servers.

This book offers you an introduction by explaining the prerequisites of Apache Solr and installing it, while also integrating it with the required additional components, and gradually progresses into features that make Solr flexible enough to achieve high performance ratings in various circumstances. Moving forward, the book will cover several clear and highly practical concepts that will help you further optimize your Solr instances’ performance both on single as well as multiple servers, and learn how to troubleshoot common problems that are prone to arise while using your Solr instance. By the end of the book you will also learn how to set up, configure, and deploy ZooKeeper along with learning more about other applications of ZooKeeper.

You will also learn how to handle data in multiple server environments, searches based on specific geographical co-ordinates, different caching techniques, and various algorithms and formulae that enable better performance; and many more.

What You Will Learn

  • Boost your search based on scores, the DisMax query parser, and function queries.
  • Explore performance metrics along with implementing different Solr caching like Document, query result, filter, and whole result page caching.
  • Index and search across shards and near real-time searching.
  • Get to grips with additional performance optimization activities like fetching documents similar to the ones queried, searching homophones, or filtering searches on the basis of specific key words.
  • Troubleshoot the common problems like corrupt and locked indexes, memory, expensive garbage collection, and infinite loop exception when using multiple server environment efficiently
  • Set up, configure, and deploy various applications of ZooKeeper to optimize Solr's performance
  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Apache Solr High Performance
      1. Table of Contents
      2. Apache Solr High Performance
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Installing Solr
        1. Prerequisites for Solr
          1. Installing components
        2. Summary
      9. 2. Boost Your Search
        1. Scoring
          1. Query-time and index-time boosting
            1. Index-time boosting
            2. Query-time boosting
          2. Troubleshoot queries and scores
        2. The dismax query parser
          1. Lucene DisjunctionMaxQuery
          2. Autophrase boosting
            1. Configuring autophrase boosting
            2. Configuring the phrase slop
            3. Boosting a partial phrase
          3. Boost queries
          4. Boost functions
            1. Boost addition and multiplication
        3. Function queries
          1. Field references
          2. Function references
          3. Mathematical operations
          4. The ord() and rord() functions
          5. Other functions
          6. Boosting the function query
          7. Logarithm
          8. Reciprocal
          9. Linear
          10. Inverse reciprocal
        4. Summary
      10. 3. Performance Optimization
        1. Solr performance factors
        2. Solr caching
          1. Document caching
          2. Query result caching
          3. Filter caching
          4. Result pages caching
        3. Using SolrCloud
          1. Creating a SolrCloud cluster
          2. Multiple collections within a cluster
          3. Managing a SolrCloud cluster
          4. Distributed indexing and searching
          5. Stopping automatic document distribution
        4. Near real-time search
        5. Summary
      11. 4. Additional Performance Optimization Techniques
        1. Documents similar to those returned in the search result
        2. Sorting results by function values
        3. Searching for homophones
        4. Ignore the defined words from being searched
        5. Summary
      12. 5. Troubleshooting
        1. Dealing with the corrupt index
        2. Reducing the file count in the index
        3. Dealing with the locked index
        4. Truncating the index size
        5. Dealing with a huge count of open files
        6. Dealing with out-of-memory issues
        7. Dealing with an infinite loop exception in shards
        8. Dealing with expensive garbage collection
        9. Bulk updating a single field without full indexation
        10. Summary
      13. 6. Performance Optimization with ZooKeeper
        1. Getting familiar with ZooKeeper
          1. Prerequisites for a distributed server
          2. Aid your distributed system using ZooKeeper
          3. Setting an ideal node count for ZooKeeper
        2. Setting up, configuring, and deploying ZooKeeper
          1. Setting up ZooKeeper
          2. Configuring ZooKeeper
          3. Deploying ZooKeeper
        3. Applications of ZooKeeper
        4. Summary
      14. A. Resources
      15. Index