You are previewing Elasticsearch Server Second Edition.
O'Reilly logo
Elasticsearch Server Second Edition

Book Description

From creating your own index structure through to cluster monitoring and troubleshooting, this is the complete guide to implementing the ElasticSearch search engine on your own websites. Packed with real-life examples.

In Detail

This book begins by introducing the most commonly used Elasticsearch server functionalities, from creating your own index structure, through querying, faceting, and aggregations, and ends with cluster monitoring and problem diagnosis. As you progress through the book, you will cover topics such as starting Elasticsearch, creating a new index, and designing its proper structure. After that, you'll read about the query API that Elasticsearch exposes, as well as about filtering capabilities, aggregations, and faceting. Last but not least, you will get to know how to find similar documents by using similar functionalities and how to implement application alerts by using the prospective search functionality called percolator. Some advanced topics such as shard allocation control, gateway configuration, and how to use the discovery module will also be discussed. This book will also show you the possibilities of cluster state and health monitoring as well as how to use third-party tools.

What You Will Learn

  • Configure and create your own index
  • Set up an analysis chain and handle multilingual data
  • Use the Elasticsearch query DSL to make all kinds of queries
  • Utilize filters efficiently and ensure they do not affect performance
  • Implement autocomplete functionality
  • Employ faceting, the aggregations framework, and similar functionalities to get more from your search and improve your clients' search experience
  • Monitor your cluster state and health by using Elasticsearch APIs as well as third-party monitoring solutions
  • Learn what gateway and discovery modules are, and how to properly configure them
  • Control primary shards and replica rebalancing
  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Elasticsearch Server Second Edition
      1. Table of Contents
      2. Elasticsearch Server Second Edition
      3. Credits
      4. About the Author
      5. Acknowledgments
      6. About the Author
      7. Acknowledgments
      8. About the Reviewers
      9. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      10. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      11. 1. Getting Started with the Elasticsearch Cluster
        1. Full-text searching
          1. The Lucene glossary and architecture
          2. Input data analysis
            1. Indexing and querying
          3. Scoring and query relevance
        2. The basics of Elasticsearch
          1. Key concepts of data architecture
            1. Index
            2. Document
            3. Document type
            4. Mapping
          2. Key concepts of Elasticsearch
            1. Node and cluster
            2. Shard
            3. Replica
            4. Gateway
          3. Indexing and searching
        3. Installing and configuring your cluster
          1. Installing Java
          2. Installing Elasticsearch
          3. Installing Elasticsearch from binary packages on Linux
            1. Installing Elasticsearch using the RPM package
            2. Installing Elasticsearch using the DEB package
          4. The directory layout
          5. Configuring Elasticsearch
          6. Running Elasticsearch
          7. Shutting down Elasticsearch
          8. Running Elasticsearch as a system service
            1. Elasticsearch as a system service on Linux
            2. Elasticsearch as a system service on Windows
        4. Manipulating data with the REST API
          1. Understanding the Elasticsearch RESTful API
          2. Storing data in Elasticsearch
          3. Creating a new document
            1. Automatic identifier creation
          4. Retrieving documents
          5. Updating documents
          6. Deleting documents
          7. Versioning
            1. An example of versioning
            2. Using the version provided by an external system
        5. Searching with the URI request query
          1. Sample data
          2. The URI request
            1. The Elasticsearch query response
            2. Query analysis
            3. URI query string parameters
              1. The query
              2. The default search field
              3. Analyzer
              4. The default operator
              5. Query explanation
              6. The fields returned
              7. Sorting the results
              8. The search timeout
              9. The results window
              10. The search type
              11. Lowercasing the expanded terms
              12. Analyzing the wildcard and prefixes
          3. The Lucene query syntax
        6. Summary
      12. 2. Indexing Your Data
        1. Elasticsearch indexing
          1. Shards and replicas
          2. Creating indices
            1. Altering automatic index creation
            2. Settings for a newly created index
        2. Mappings configuration
          1. Type determining mechanism
            1. Disabling field type guessing
          2. Index structure mapping
            1. Type definition
            2. Fields
            3. Core types
              1. Common attributes
              2. String
              3. Number
              4. Boolean
              5. Binary
              6. Date
            4. Multifields
            5. The IP address type
            6. The token_count type
            7. Using analyzers
              1. Out-of-the-box analyzers
              2. Defining your own analyzers
              3. Analyzer fields
              4. Default analyzers
          3. Different similarity models
            1. Setting per-field similarity
            2. Available similarity models
              1. Configuring DFR similarity
              2. Configuring IB similarity
          4. The postings format
            1. Configuring the postings format
          5. Doc values
            1. Configuring the doc values
            2. Doc values formats
        3. Batch indexing to speed up your indexing process
          1. Preparing data for bulk indexing
          2. Indexing the data
          3. Even quicker bulk requests
        4. Extending your index structure with additional internal information
          1. Identifier fields
          2. The _type field
          3. The _all field
          4. The _source field
            1. Exclusion and inclusion
          5. The _index field
          6. The _size field
          7. The _timestamp field
          8. The _ttl field
        5. Introduction to segment merging
          1. Segment merging
          2. The need for segment merging
          3. The merge policy
          4. The merge scheduler
          5. The merge factor
          6. Throttling
        6. Introduction to routing
          1. Default indexing
          2. Default searching
          3. Routing
          4. The routing parameters
          5. Routing fields
        7. Summary
      13. 3. Searching Your Data
        1. Querying Elasticsearch
          1. The example data
          2. A simple query
          3. Paging and result size
          4. Returning the version value
          5. Limiting the score
          6. Choosing the fields that we want to return
            1. The partial fields
          7. Using the script fields
            1. Passing parameters to the script fields
        2. Understanding the querying process
          1. Query logic
          2. Search types
          3. Search execution preferences
          4. The Search shards API
        3. Basic queries
          1. The term query
          2. The terms query
          3. The match_all query
          4. The common terms query
          5. The match query
            1. The Boolean match query
            2. The match_phrase query
            3. The match_phrase_prefix query
          6. The multi_match query
          7. The query_string query
            1. Running the query_string query against multiple fields
          8. The simple_query_string query
          9. The identifiers query
          10. The prefix query
          11. The fuzzy_like_this query
          12. The fuzzy_like_this_field query
          13. The fuzzy query
          14. The wildcard query
          15. The more_like_this query
          16. The more_like_this_field query
          17. The range query
          18. The dismax query
          19. The regular expression query
        4. Compound queries
          1. The bool query
          2. The boosting query
          3. The constant_score query
          4. The indices query
        5. Filtering your results
          1. Using filters
          2. Filter types
            1. The range filter
            2. The exists filter
            3. The missing filter
            4. The script filter
            5. The type filter
            6. The limit filter
            7. The identifiers filter
            8. If this is not enough
            9. Combining filters
              1. A word about the bool filter
            10. Named filters
          3. Caching filters
        6. Highlighting
          1. Getting started with highlighting
          2. Field configuration
          3. Under the hood
          4. Configuring HTML tags
          5. Controlling the highlighted fragments
          6. Global and local settings
          7. Require matching
          8. The postings highlighter
        7. Validating your queries
          1. Using the validate API
        8. Sorting data
          1. Default sorting
          2. Selecting fields used for sorting
          3. Specifying the behavior for missing fields
          4. Dynamic criteria
          5. Collation and national characters
        9. Query rewrite
          1. An example of the rewrite process
          2. Query rewrite properties
        10. Summary
      14. 4. Extending Your Index Structure
        1. Indexing tree-like structures
          1. Data structure
          2. Analysis
        2. Indexing data that is not flat
          1. Data
          2. Objects
          3. Arrays
          4. Mappings
            1. Final mappings
          5. Sending the mappings to Elasticsearch
          6. To be or not to be dynamic
        3. Using nested objects
          1. Scoring and nested queries
        4. Using the parent-child relationship
          1. Index structure and data indexing
            1. Parent mappings
            2. Child mappings
            3. The parent document
            4. The child documents
          2. Querying
            1. Querying data in the child documents
              1. The top children query
            2. Querying data in the parent documents
          3. The parent-child relationship and filtering
          4. Performance considerations
        5. Modifying your index structure with the update API
          1. The mappings
          2. Adding a new field
          3. Modifying fields
        6. Summary
      15. 5. Make Your Search Better
        1. An introduction to Apache Lucene scoring
          1. When a document is matched
          2. Default scoring formula
          3. Relevancy matters
        2. Scripting capabilities of Elasticsearch
          1. Objects available during script execution
          2. MVEL
          3. Using other languages
          4. Using our own script library
            1. Using native code
              1. The factory implementation
              2. Implementing the native script
              3. Installing scripts
              4. Running the script
        3. Searching content in different languages
          1. Handling languages differently
          2. Handling multiple languages
          3. Detecting the language of the documents
          4. Sample document
          5. The mappings
          6. Querying
            1. Queries with the identified language
            2. Queries with unknown languages
            3. Combining queries
        4. Influencing scores with query boosts
          1. The boost
          2. Adding boost to queries
          3. Modifying the score
            1. The constant_score query
            2. The boosting query
            3. The function_score query
              1. The structure of the function query
            4. Deprecated queries
              1. Replacing the custom_boost_factor query
              2. Replacing the custom_score query
              3. Replacing the custom_filters_score query
        5. When does index-time boosting make sense?
          1. Defining field boosting in input data
          2. Defining boosting in mapping
        6. Words with the same meaning
          1. The synonym filter
            1. Synonyms in the mappings
            2. Synonyms stored in the filesystem
          2. Defining synonym rules
            1. Using Apache Solr synonyms
              1. Explicit synonyms
              2. Equivalent synonyms
              3. Expanding synonyms
            2. Using WordNet synonyms
          3. Query- or index-time synonym expansion
        7. Understanding the explain information
          1. Understanding field analysis
          2. Explaining the query
        8. Summary
      16. 6. Beyond Full-text Searching
        1. Aggregations
          1. General query structure
          2. Available aggregations
            1. Metric aggregations
              1. Min, max, sum, and avg aggregations
                1. Using scripts
              2. The value_count aggregation
              3. The stats and extended_stats aggregations
            2. Bucketing
              1. The terms aggregation
              2. The range aggregation
              3. The date_range aggregation
              4. IPv4 range aggregation
              5. The missing aggregation
              6. Nested aggregation
              7. The histogram aggregation
              8. The date_histogram aggregation
                1. Time zones
              9. The geo_distance aggregation
              10. The geohash_grid aggregation
          3. Nesting aggregations
          4. Bucket ordering and nested aggregations
          5. Global and subsets
            1. Inclusions and exclusions
        2. Faceting
          1. The document structure
          2. Returned results
          3. Using queries for faceting calculations
          4. Using filters for faceting calculations
          5. Terms faceting
          6. Ranges based faceting
            1. Choosing different fields for an aggregated data calculation
          7. Numerical and date histogram faceting
            1. The date_histogram facet
          8. Computing numerical field statistical data
          9. Computing statistical data for terms
          10. Geographical faceting
          11. Filtering faceting results
          12. Memory considerations
        3. Using suggesters
          1. Available suggester types
          2. Including suggestions
            1. The suggester response
          3. The term suggester
            1. The term suggester configuration options
            2. Additional term suggester options
          4. The phrase suggester
            1. Configuration
          5. The completion suggester
            1. Indexing data
            2. Querying the indexed completion suggester data
            3. Custom weights
        4. Percolator
          1. The index
          2. Percolator preparation
          3. Getting deeper
            1. Getting the number of matching queries
            2. Indexed documents percolation
        5. Handling files
          1. Adding additional information about the file
        6. Geo
          1. Mappings preparation for spatial search
          2. Example data
          3. Sample queries
            1. Distance-based sorting
            2. Bounding box filtering
            3. Limiting the distance
          4. Arbitrary geo shapes
            1. Point
            2. Envelope
            3. Polygon
            4. Multipolygon
            5. An example usage
            6. Storing shapes in the index
        7. The scroll API
          1. Problem definition
          2. Scrolling to the rescue
        8. The terms filter
          1. Terms lookup
            1. The terms lookup query structure
            2. Terms lookup cache settings
        9. Summary
      17. 7. Elasticsearch Cluster in Detail
        1. Node discovery
          1. Discovery types
          2. The master node
            1. Configuring the master and data nodes
            2. The master-election configuration
          3. Setting the cluster name
            1. Configuring multicast
            2. Configuring unicast
          4. Ping settings for nodes
        2. The gateway and recovery modules
          1. The gateway
          2. Recovery control
            1. Additional gateway recovery options
        3. Preparing Elasticsearch cluster for high query and indexing throughput
          1. The filter cache
          2. The field data cache and circuit breaker
            1. The circuit breaker
          3. The store
          4. Index buffers and the refresh rate
            1. The index refresh rate
          5. The thread pool configuration
          6. Combining it all together – some general advice
            1. Choosing the right store
            2. The index refresh rate
            3. Tuning the thread pools
            4. Tuning your merge process
            5. The field data cache and breaking the circuit
            6. RAM buffer for indexing
            7. Tuning transaction logging
            8. Things to keep in mind
        4. Templates and dynamic templates
          1. Templates
            1. An example of a template
            2. Storing templates in files
          2. Dynamic templates
            1. The matching pattern
            2. Field definitions
        5. Summary
      18. 8. Administrating Your Cluster
        1. The Elasticsearch time machine
          1. Creating a snapshot repository
          2. Creating snapshots
            1. Additional parameters
          3. Restoring a snapshot
          4. Cleaning up – deleting old snapshots
        2. Monitoring your cluster's state and health
          1. The cluster health API
            1. Controlling information details
            2. Additional parameters
          2. The indices stats API
            1. Docs
            2. Store
            3. Indexing, get, and search
            4. Additional information
          3. The status API
          4. The nodes info API
          5. The nodes stats API
          6. The cluster state API
          7. The pending tasks API
          8. The indices segments API
          9. The cat API
            1. Limiting returned information
        3. Controlling cluster rebalancing
          1. Rebalancing
          2. Cluster being ready
          3. The cluster rebalance settings
            1. Controlling when rebalancing will start
            2. Controlling the number of shards being moved between nodes concurrently
            3. Controlling the number of shards initialized concurrently on a single node
            4. Controlling the number of primary shards initialized concurrently on a single node
            5. Controlling types of shards allocation
            6. Controlling the number of concurrent streams on a single node
        4. Controlling the shard and replica allocation
          1. Explicitly controlling allocation
            1. Specifying node parameters
            2. Configuration
            3. Index creation
            4. Excluding nodes from allocation
            5. Requiring node attributes
            6. Using IP addresses for shard allocation
            7. Disk-based shard allocation
              1. Enabling disk-based shard allocation
              2. Configuring disk-based shard allocation
          2. Cluster wide allocation
          3. Number of shards and replicas per node
          4. Moving shards and replicas manually
            1. Moving shards
            2. Canceling shard allocation
            3. Forcing shard allocation
            4. Multiple commands per HTTP request
        5. Warming up
          1. Defining a new warming query
          2. Retrieving the defined warming queries
          3. Deleting a warming query
          4. Disabling the warming up functionality
          5. Choosing queries
        6. Index aliasing and using it to simplify your everyday work
          1. An alias
          2. Creating an alias
          3. Modifying aliases
          4. Combining commands
          5. Retrieving all aliases
          6. Removing aliases
          7. Filtering aliases
          8. Aliases and routing
        7. Elasticsearch plugins
          1. The basics
          2. Installing plugins
          3. Removing plugins
        8. The update settings API
        9. Summary
      19. Index