You are previewing Solr Cookbook - Third Edition.
O'Reilly logo
Solr Cookbook - Third Edition

Book Description

Solve real-time problems related to Apache Solr 4.x and 5.0 effectively with the help of over 100 easy-to-follow recipes

In Detail

Starting with vital information on setting up Solr, you will quickly progress to analyzing your text data through querying and performance improvement.

With the help of intermediate and advanced recipes, you will learn how to index data and query Solr. Then, you will deep dive into faceting and learn how to improve Solr's performance. You will also work with SolrCloud clusters and will get to grips with the advanced functionalities of Solr. Finally, you will explore real-life situations, where Solr can be used to simplify daily collection handling. By the end of this book, you will be able to produce enhanced, optimized, and powerful results by implementing pro-level practices and techniques.

What You Will Learn

  • Acquire the skills needed to index your data in different formats, forms, and sources

  • Overcome common problems while analyzing your data

  • Use the faceting mechanism to get aggregated information about your data

  • Improve your Solr instance and Solr cluster performance

  • Get to know how to configure and use SolrCloud

  • Make use of the highlighting and document grouping functionalities

  • Diagnose and resolve problems with Solr instances and clusters

  • Implement different autocomplete functionalities

  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Solr Cookbook Third Edition
      1. Table of Contents
      2. Solr Cookbook Third Edition
      3. Credits
      4. About the Author
      5. Acknowledgments
      6. About the Reviewers
      7. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      8. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Sections
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Conventions
        6. Reader feedback
        7. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      9. 1. Apache Solr Configuration
        1. Introduction
        2. Running Solr on a standalone Jetty
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. I want Jetty to run on a different port
            2. Buffer size is too small
        3. Installing ZooKeeper for SolrCloud
          1. Getting ready
          2. How to do it...
          3. How it works...
        4. Migrating configuration from master-slave to SolrCloud
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Choosing the proper directory configuration
          1. How to do it...
          2. How it works...
        6. Configuring the Solr spellchecker
          1. How to do it...
          2. How it works...
          3. There's more...
            1. More than one spellchecker
        7. Using Solr in a schemaless mode
          1. How to do it...
          2. How it works...
        8. Limiting I/O usage
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. Using core discovery
          1. How to do it...
          2. How it works...
          3. There's more...
        10. Configuring SolrCloud for NRT use cases
          1. How to do it...
          2. How it works...
        11. Configuring SolrCloud for high-indexing use cases
          1. Getting ready
          2. How to do it...
          3. How it works...
        12. Configuring SolrCloud for high-querying use cases
          1. Getting ready
          2. How to do it...
          3. How it works...
        13. Configuring the Solr heartbeat mechanism
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Enabling and disabling the heartbeat mechanism
        14. Changing similarity
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Changing the global similarity
      10. 2. Indexing Your Data
        1. Introduction
        2. Indexing PDF files
          1. How to do it...
          2. How it works...
        3. Counting the number of fields
          1. How to do it...
          2. How it works...
        4. Using parsing update processors to parse data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        5. Using scripting update processors to modify documents
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. Indexing data from a database using Data Import Handler
          1. How to do it...
          2. How it works...
          3. There's more...
            1. How to change the default behavior of deleting index contents at the beginning of a full import
        7. Incremental imports with DIH
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Transforming data when using DIH
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Using scripts other than JavaScript
        9. Indexing multiple geographical points
          1. How to do it...
          2. How it works...
          3. See also
        10. Updating document fields
          1. How to do it...
          2. How it works...
        11. Detecting the document language during indexation
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Language identification based on Apache Tika
        12. Optimizing the primary key indexation
          1. How to do it...
          2. How it works...
          3. See also
        13. Handling multiple currencies
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Setting up your own currency provider
      11. 3. Analyzing Your Text Data
        1. Introduction
        2. Using the enumeration type
          1. How to do it...
          2. How it works...
        3. Removing HTML tags during indexing
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Preserving defined tags
          4. See also
        4. Storing data outside of Solr index
          1. How to do it...
          2. How it works...
        5. Using synonyms
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Equivalent synonyms setup
          4. See also
        6. Stemming different languages
          1. How to do it...
          2. How it works...
          3. There's more...
        7. Using nonaggressive stemmers
          1. How to do it...
          2. How it works...
          3. There's more...
        8. Using the n-gram approach to do performant trailing wildcard searches
          1. How to do it...
          2. How it works...
        9. Using position increment to divide sentences
          1. How to do it...
          2. How it works...
        10. Using patterns to replace tokens
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Using solr.PatternReplaceCharFilterFactory
      12. 4. Querying Solr
        1. Introduction
        2. Understanding and using the Lucene query language
          1. How to do it...
          2. How it works...
          3. See also
        3. Using position aware queries
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Too many generated queries
        4. Using boosting with autocomplete
          1. How to do it...
          2. How it works...
        5. Phrase queries with shingles
          1. How to do it...
          2. How it works...
          3. See also
        6. Handling user queries without errors
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Handling hierarchies with nested documents
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Returning children documents in the query
        8. Sorting data on the basis of a function value
          1. How to do it...
          2. How it works...
        9. Controlling the number of terms needed to match
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        10. Affecting document score using function queries
          1. How to do it...
          2. How it works...
          3. See also
        11. Using simple nested queries
          1. How to do it...
          2. How it works...
        12. Using the Solr document query join functionality
          1. How to do it...
          2. How it works...
        13. Handling typos with n-grams
          1. How to do it...
          2. How it works...
        14. Rescoring query results
          1. How to do it...
          2. How it works...
      13. 5. Faceting
        1. Introduction
        2. Getting the number of documents with the same field value
          1. How to do it...
          2. How it works...
          3. There's more...
            1. How to show facets with counts greater than zero
            2. Lexicographical sorting of the faceting results
        3. Getting the number of documents with the same value range
          1. How to do it...
          2. How it works...
        4. Getting the number of documents matching the query and subquery
          1. How to do it...
          2. How it works...
        5. Removing filters from faceting results
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Using decision tree faceting
          1. How to do it...
          2. How it works...
        7. Calculating faceting for relevant documents in groups
          1. Getting ready
          2. How to do it...
          3. How it works...
        8. Improving faceting performance for low cardinality fields
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Using per segment field cache for faceting calculation
            2. Specifying the number of faceting threads
      14. 6. Improving Solr Performance
        1. Introduction
        2. Handling deep paging efficiently
          1. How to do it...
          2. How it works...
          3. See also
        3. Configuring the document cache
          1. Getting ready
          2. How to do it...
          3. How it works...
        4. Configuring the query result cache
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Configuring the filter cache
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Improving Solr query performance after the start and commit operations
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Improving Solr performance after committing operations
        7. Lowering the memory consumption of faceting and sorting
          1. How to do it...
          2. How it works...
        8. Speeding up indexing with Solr segment merge tuning
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Increasing the RAM buffer size to improve the indexing throughput
            2. Speeding up querying with merge policy tuning
          4. See also
        9. Avoiding caching of rare filters to improve the performance
          1. How to do it...
          2. How it works...
        10. Controlling the filter execution to improve expensive filter performance
          1. Getting ready
          2. How to do it...
          3. How it works...
        11. Configuring numerical fields for high-performance sorting and range queries
          1. How to do it...
          2. How it works...
          3. See also
      15. 7. In the Cloud
        1. Introduction
        2. Creating a new SolrCloud cluster
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Starting an embedded ZooKeeper server
            2. Specifying the Solr server name
        3. Setting up multiple collections on a single cluster
          1. Getting ready
          2. How to do it...
          3. How it works...
        4. Splitting shards
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Having more than a single shard from a collection on a node
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Creating a collection on defined nodes
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. Adding replicas after collection creation
          1. Getting ready
          2. How to do it...
          3. How it works...
        8. Removing replicas
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. Moving shards between nodes
          1. Getting ready
          2. How to do it...
          3. How it works...
        10. Using aliasing
          1. Getting ready
          2. How to do it...
          3. How it works...
        11. Using routing
          1. Getting ready
          2. How to do it...
          3. How it works...
      16. 8. Using Additional Functionalities
        1. Introduction
        2. Finding similar documents
          1. How to do it...
          2. How it works...
        3. Highlighting fragments found in documents
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Changing the default HTML tags that surround the matched content
        4. Efficient highlighting
          1. How to do it...
          2. How it works...
        5. Using versioning
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Retrieving information about the index structure
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Retrieving the index structure information in XML
            2. Retrieving information about dynamic fields
            3. Retrieving information about copy fields
          4. See also
        7. Altering the index structure on a live collection
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Grouping documents by the field value
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Having more than a single document in a group
            2. Modifying the number of returned groups
        9. Grouping documents by the query value
          1. Getting ready
          2. How to do it…
          3. How it works...
        10. Grouping documents by the function value
          1. Getting ready
          2. How to do it...
          3. How it works...
        11. Efficient documents grouping using the post filter
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Expanding collapsed groups
      17. 9. Dealing with Problems
        1. Introduction
        2. Dealing with the too many opened files exception
          1. How to do it...
          2. How it works...
        3. Diagnosing and dealing with memory problems
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Seeing heap when out of memory error occurs
        4. Configuring sorting for non-English languages
          1. How to do it...
          2. How it works...
        5. Migrating data to another collection
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. SolrCloud read-side fault tolerance
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Defining the achieved replication factor
        7. Using the check index functionality
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Checking the index without the repair procedure
        8. Adjusting the Jetty configuration to avoid deadlocks
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. Tuning segment merging
          1. How to do it...
          2. How it works...
          3. See also
        10. Avoiding swapping
          1. Getting ready
          2. How to do it...
          3. How it works...
      18. 10. Real-life Situations
        1. Introduction
        2. Implementing the autocomplete functionality for products
          1. How to do it...
          2. How it works...
        3. Implementing the autocomplete functionality for categories
          1. How to do it...
          2. How it works...
        4. Handling time-sliced data using aliases
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Deleting an alias
        5. Boosting words closer to each other
          1. How to do it...
          2. How it works...
        6. Using the Solr spellchecking functionality
          1. Getting ready
          2. How to do it...
          3. How it works...
        7. Using the Solr administration panel for monitoring
          1. How to do it...
          2. How it works...
          3. There's more...
            1. SPM Performance Monitoring & Alerting
        8. Automatically expiring Solr documents
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Changing the time to live parameter name
        9. Exporting whole query results
          1. How to do it...
          2. How it works...
      19. Index