You are previewing Apache Solr Search Patterns.
O'Reilly logo
Apache Solr Search Patterns

Book Description

Leverage the power of Apache Solr to power up your business by navigating your users to their data quickly and efficiently

In Detail

Apache Solr is an open source search platform built on a Java library called Lucene. It serves as a search platform for many websites, as it has the capability of indexing and searching multiple websites to fetch desired results.

We begin with a brief introduction of analyzers and tokenizers to understand the challenges associated with implementing large-scale indexing and multilingual search functionality. We then move on to working with custom queries and understanding how filters work internally. While doing so, we also create our own query language or Solr plugin that does proximity searches. Furthermore, we discuss how Solr can be used for real-time analytics and tackle problems faced during its implementation in e-commerce search. We then dive deep into the spatial features such as indexing strategies and search/filtering strategies for a spatial search. We also do an in-depth analysis of problems faced in an ad serving platform and how Solr can be used to solve these problems.

What You Will Learn

  • Customize the Solr scoring algorithm to get better and more relevant search results

  • Use Solr with big data for analytical purposes

  • Get insights into Solr internals—indexing and search

  • Setting up and scaling with Solr cloud

  • Implement spatial search with Solr

  • Understand Finite State Transducers (FST) and implement text tagging using FST

  • Breeze through the strategies used in executing search using Solr in e-commerce, advertising, and real estate websites

  • Learn more about how to use Solr with AJAX

  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the files e-mailed directly to you.

    Table of Contents

    1. Apache Solr Search Patterns
      1. Table of Contents
      2. Apache Solr Search Patterns
      3. Credits
      4. About the Author
      5. About the Reviewers
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Solr Indexing Internals
        1. The job site problem statement – Solr indexing fundamentals
        2. Working of analyzers, tokenizers, and filters
        3. Handling a multilingual search
        4. Measuring the quality of search results
        5. The e-commerce problem statement
        6. The job site problem statement
        7. Challenges of large-scale indexing
          1. Using multiple threads for indexing on Solr
          2. Using the Java binary format of data for indexing
          3. Using the ConcurrentUpdateSolrServer class for indexing
            1. Solr configuration changes that can improve indexing performance
          4. Planning your commit strategy
          5. Using better hardware
          6. Distributed indexing
        8. The SolrCloud solution
        9. Summary
      9. 2. Customizing the Solr Scoring Algorithm
        1. Relevance calculation
        2. Building a custom scorer
        3. Drawbacks of the TF-IDF model
        4. The information gain model
        5. Implementing the information gain model
        6. Options to TF-IDF similarity
          1. BM25 similarity
          2. DFR similarity
        7. Summary
      10. 3. Solr Internals and Custom Queries
        1. Working of a scorer on an inverted index
        2. Working of OR and AND clauses
        3. The eDisMax query parser
          1. Working of the eDisMax query parser
          2. The minimum should match parameter
          3. Working of filters
        4. Using BRS queries instead of DisMax
        5. Building a custom query parser
          1. Proximity search using SWAN queries
          2. Creating a parboiled parser
          3. Building a Solr plugin for SWAN queries
          4. Integrating the SWAN plugin in Solr
        6. Summary
      11. 4. Solr for Big Data
        1. Introduction to big data
        2. Getting data points using facets
          1. Field faceting
          2. Query and range faceting
        3. Radius faceting for location-based data
          1. The geofilt filter
          2. The bounding box filter
          3. The rectangle filter
          4. Distance function queries
          5. Radius faceting
        4. Data analysis using pivot faceting
        5. Graphs for analytics
          1. Getting started with Highcharts
          2. Displaying Solr data using Highcharts
        6. Summary
      12. 5. Solr in E-commerce
        1. Designing an e-commerce search
        2. Handling unclean data
        3. Handling variations in the product
        4. Sorting
        5. Problems and solutions of flash sale searches
        6. Faceting with the option of multi-select
        7. Faceting with hierarchical taxonomy
        8. Faceting with size
        9. Implementing semantic search
        10. Optimizations
        11. Summary
      13. 6. Solr for Spatial Search
        1. Features of spatial search
          1. Java Topology Suite
          2. Well-known Text
          3. The Spatial4j library
        2. Lucene 4 spatial module
          1. SpatialRecursivePrefixTreeFieldType
          2. BBoxField (to be introduced in Solr 4.10)
        3. Indexing for spatial search
        4. Searching and filtering on a spatial index
          1. The bbox query
        5. Distance sort and relevancy boost
        6. Advanced concepts
          1. Quadtree
            1. Indexing data
            2. Searching data
          2. Geohash
        7. Summary
      14. 7. Using Solr in an Advertising System
        1. Ad system functionalities
        2. Architecture of an ad distribution system
        3. Requirements of an ad distribution system
          1. Schema for a listing ad
          2. Schema for targeted ads
        4. Performance improvements
          1. fieldCache
          2. fieldValueCache
          3. documentCache
          4. filterCache
          5. queryResultCache
          6. Application cache
          7. Garbage collection
        5. Merging Solr with Redis
        6. Summary
      15. 8. AJAX Solr
        1. The purpose of AJAX Solr
        2. The AJAX Solr architecture
          1. The Manager controller
          2. The ParameterStore model
            1. Available parameters
            2. Exposed parameters
            3. Using the ParameterHashStore class
            4. Extending the ParameterStore class
          3. Widgets
        3. Working with AJAX Solr
          1. Talking to AJAX Solr
          2. Displaying the result
          3. Adding facets
          4. Adding pagination
          5. Adding a tag cloud
        4. Performance tuning
        5. Summary
      16. 9. SolrCloud
        1. The SolrCloud architecture
        2. Centralized configuration
        3. Setting up SolrCloud
          1. Test setup for SolrCloud
          2. Setting up SolrCloud in production
            1. Setting up the Zookeeper ensemble
            2. Setting up Tomcat with Solr
        4. Distributed indexing and search
        5. Routing documents to a particular shard
        6. Adding more nodes to the SolrCloud
        7. Fault tolerance and high availability in SolrCloud
        8. Advanced sharding with SolrCloud
          1. Shard splitting
          2. Deleting a shard
          3. Moving the existing shard to a new node
          4. Shard splitting based on split key
        9. Asynchronous calls
        10. Migrating documents to another collection
        11. Sizing and monitoring of SolrCloud
        12. Using SolrCloud as a NoSQL database
        13. Summary
      17. 10. Text Tagging with Lucene FST
        1. An overview of FST and text tagging
        2. Implementation of FST in Lucene
        3. Text tagging algorithms
          1. Fuzzy string matching algorithm
            1. The Levenshtein distance algorithm
            2. Damerau–Levenshtein distance
        4. Using Solr for text tagging
        5. Implementing a text tagger using Solr
        6. Summary
      18. Index