You are previewing Instant Apache Solr for Indexing Data How-to.
O'Reilly logo
Instant Apache Solr for Indexing Data How-to

Book Description

Nobody pretends indexing data with Apache Solr is a walk in the park, but this book eases the path with plain language explanations and involving projects. Perfect for developers with sophisticated indexing ambitions.

  • Learn something new in an Instant! A short, fast, focused guide delivering immediate results

  • Take the most basic schema and extend it to support multi-lingual, multi-field searches

  • Make Solr pull data from a variety of existing sources

  • Discover different pathways to acquire and normalize data and content

  • In Detail

    Content and data searching is a very important part of the modern user experience, and before something can be searched, it has to be indexed. Indexing is a hidden part of the process that has a surprisingly strong impact on the overall user experience. From speed, to faceting, to multilingual support, everything depends on correct indexing.

    Instant Apache Solr for Indexing Data How-to is an example-driven guide that will take you on a journey from the basic collection of data to a multi-lingual, multi-field, multi-type schema. By the end of the book, you will know how to get your data ready for searches and how to tune the process to achieve the required search use-cases.

    Instant Apache Solr for Indexing Data How-to is a friendly, practical guide that will show you how to index your data with Solr. This book will explain how Solr’s basic blocks actually work and fit together. You will then explore additional settings, pipelines, and configuration changes to achieve ever more complex goals. You will then cover how to push data into Solr and when to get Solr to pull the data. You will then master indexing textual and binary context before enabling multilingual content to be searched.

    Table of Contents

    1. Instant Apache Solr for Indexing Data How-to
      1. Instant Apache Solr for Indexing Data How-to
      2. Credits
      3. About the Author
      4. About the Reviewer
      5. www.PacktPub.com
        1. Support files, eBooks, discount offers and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      6. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
          5. Disclaimer
      7. 1. Instant Apache Solr for Indexing Data How-to
        1. Creating your first collection (Simple)
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Default collection
              1. CSV and JSON update handlers
              2. Scripting the Solr server startup
        2. Running several collections at once (Simple)
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Manipulating cores from Admin WebUI
            2. Auto-discovery of Solr cores
        3. Importing multivalued fields (Simple)
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Unique document keys
            2. Advanced CSV processing
            3. Indexing unexpected multivalued fields
        4. Using Solr's XML format (Simple)
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Other operations possible in updated XML
            2. Using JSON format
        5. Indexing text (Intermediate)
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Defining fields in bulk
            2. Versatility of copyField
        6. Indexing text – in depth (Advanced)
          1. How to do it...
          2. How it works...
          3. There's more...
            1. The gory details
            2. Many types of boosting
            3. More about facets
            4. Adding libraries to Solr
        7. Indexing binary content on the server (Intermediate)
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Date math
        8. Pulling data from XML with DataImportHandler (Intermediate)
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Ignoring XML namespaces
            2. Clearing the old documents
        9. Pulling data from the database with DIH (Intermediate)
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
            1. Automatic field mapping
            2. Incremental update
            3. Incremental delete
            4. Enterprise data sources
        10. Commits and near real-time optimizations (Advanced)
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Multiple entries with the get query
            2. Commit by document count
            3. Per-document commitWithin
        11. Using the UpdateRequestProcessor plugins (Intermediate)
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Data Import Handler and Update Request Processors
            2. Script Update Request Processor
        12. Client indexing with Java (Intermediate)
          1. How to do it...
          2. How it works...
          3. There's more...
            1. Working in batches
        13. Atomic updates (Intermediate)
          1. How to do it...
          2. How it works...
        14. Indexing multiple languages (Advanced)
          1. How to do it...
          2. How it works...
            1. Getting the text into the right fields
            2. Searching the right fields
            3. Displaying the right fields
            4. Final touches
          3. There's more...
            1. Language identification algorithms
            2. Enhancing the default search handler
            3. Removing the catch-all (text) field