You are previewing Elasticsearch Essentials.
O'Reilly logo
Elasticsearch Essentials

Book Description

Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide

About This Book

  • New to ElasticSearch? Here’s what you need—a highly practical guide that gives you a quick start with ElasticSearch using easy-to-follow examples; get up and running with ElasticSearch APIs in no time

  • Get the latest guide on ElasticSearch 2.0.0, which contains concise and adequate information on handling all the issues a developer needs to know while handling data in bulk with search relevancy

  • Learn to create large-scale ElasticSearch clusters using best practices

  • Learn from our experts—written by Bharvi Dixit who has extensive experience in working with search servers (especially ElasticSearch)

  • Who This Book Is For

    Anyone who wants to build efficient search and analytics applications can choose this book. This book is also beneficial for skilled developers, especially ones experienced with Lucene or Solr, who now want to learn Elasticsearch quickly.

    What You Will Learn

  • Get to know about advanced Elasticsearch concepts and its REST APIs

  • Write CRUD operations and other search functionalities using the ElasticSearch Python and Java clients

  • Dig into wide range of queries and find out how to use them correctly

  • Design schema and mappings with built-in and custom analyzers

  • Excel in data modeling concepts and query optimization

  • Master document relationships and geospatial data

  • Build analytics using aggregations

  • Setup and scale Elasticsearch clusters using best practices

  • Learn to take data backups and secure Elasticsearch clusters

  • In Detail

    With constantly evolving and growing datasets, organizations have the need to find actionable insights for their business. ElasticSearch, which is the world's most advanced search and analytics engine, brings the ability to make massive amounts of data usable in a matter of milliseconds. It not only gives you the power to build blazing fast search solutions over a massive amount of data, but can also serve as a NoSQL data store.

    This guide will take you on a tour to become a competent developer quickly with a solid knowledge level and understanding of the ElasticSearch core concepts. Starting from the beginning, this book will cover these core concepts, setting up ElasticSearch and various plugins, working with analyzers, and creating mappings. This book provides complete coverage of working with ElasticSearch using Python and performing CRUD operations and aggregation-based analytics, handling document relationships in the NoSQL world, working with geospatial data, and taking data backups. Finally, we’ll show you how to set up and scale ElasticSearch clusters in production environments as well as providing some best practices.

    Style and approach

    This is an easy-to-follow guide with practical examples and clear explanations of the concepts. This fast-paced book believes in providing very rich content focusing majorly on practical implementation. This book will provide you with step-by-step practical examples, letting you know about the common errors and solutions along with ample screenshots and code to ensure your success.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the code file.

    Table of Contents

    1. Elasticsearch Essentials
      1. Table of Contents
      2. Elasticsearch Essentials
      3. Credits
      4. About the Author
      5. Acknowledgments
      6. About the Reviewer
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      8. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      9. 1. Getting Started with Elasticsearch
        1. Introducing Elasticsearch
          1. The primary features of Elasticsearch
          2. Understanding REST and JSON
            1. What is REST?
            2. What is JSON?
          3. Elasticsearch common terms
          4. Understanding Elasticsearch structure with respect to relational databases
        2. Installing and configuring Elasticsearch
          1. Installing Elasticsearch on Ubuntu through Debian package
          2. Installing Elasticsearch on Centos through the RPM package
          3. Understanding the Elasticsearch installation directory layout
          4. Configuring basic parameters
          5. Adding another node to the cluster
          6. Installing Elasticsearch plugins
            1. Checking for installed plugins
            2. Installing the Head plugin for Elasticsearch
            3. Installing Sense for Elasticsearch
        3. Basic operations with Elasticsearch
          1. Creating an Index
          2. Indexing a document in Elasticsearch
          3. Fetching documents
            1. Get a complete document
            2. Getting part of a document
          4. Updating documents
            1. Updating a whole document
            2. Updating documents partially
          5. Deleting documents
          6. Checking documents' existence
        4. Summary
      10. 2. Understanding Document Analysis and Creating Mappings
        1. Text search
          1. TF-IDF
          2. Inverted indexes
        2. Document analysis
          1. Introducing Lucene analyzers
          2. Creating custom analyzers
          3. Changing a default analyzer
          4. Putting custom analyzers into action
        3. Elasticsearch mapping
          1. Document metadata fields
          2. Data types and index analysis options
            1. Configuring data types
              1. String
              2. Number
              3. Date
              4. Boolean
              5. Arrays
              6. Objects
            2. Indexing the same field in different ways
          3. Putting mappings in an index
          4. Viewing mappings
          5. Updating mappings
        4. Summary
      11. 3. Putting Elasticsearch into Action
        1. CRUD operations using elasticsearch-py
          1. Setting up the environment
            1. Installing Pip
            2. Installing virtualenv
            3. Installing elasticsearch-py
          2. Performing CRUD operations
            1. Request timeouts
            2. Creating indexes with settings and mappings
            3. Indexing documents
            4. Retrieving documents
            5. Updating documents
              1. Replacing the value of a field completely
            6. Appending a value in an array
            7. Updates using doc
            8. Checking document existence
            9. Deleting a document
        2. CRUD operations using Java
          1. Connecting with Elasticsearch
          2. Indexing a document
          3. Fetching a document
          4. Updating a document
            1. Updating a document using doc
            2. Updating a document using script
          5. Deleting documents
        3. Creating a search database
        4. Elasticsearch Query-DSL
        5. Understanding Query-DSL parameters
          1. Query types
          2. Full-text search queries
            1. match_all
            2. match query
              1. Phrase search
            3. multi match
            4. query_string
          3. Term-based search queries
            1. Term query
            2. Terms query
            3. Range queries
            4. Exists queries
            5. Missing queries
          4. Compound queries
            1. Bool queries
            2. Not queries
        6. Search requests using Python
        7. Search requests using Java
          1. Parsing search responses
        8. Sorting your data
          1. Sorting documents by field values
          2. Sorting on more than one field
          3. Sorting multivalued fields
          4. Sorting on string fields
        9. Document routing
        10. Summary
      12. 4. Aggregations for Analytics
        1. Introducing the aggregation framework
          1. Aggregation syntax
          2. Extracting values
          3. Returning only aggregation results
        2. Metric aggregations
          1. Computing basic stats
            1. Combined stats
            2. Computing stats separately
          2. Computing extended stats
          3. Finding distinct counts
        3. Bucket aggregations
          1. Terms aggregation
          2. Range aggregation
          3. Date range aggregation
          4. Histogram aggregation
          5. Date histogram aggregation
          6. Filter-based aggregation
        4. Combining search, buckets, and metrics
        5. Memory pressure and implications
        6. Summary
      13. 5. Data Looks Better on Maps: Master Geo-Spatiality
        1. Introducing geo-spatial data
        2. Working with geo-point data
          1. Mapping geo-point fields
          2. Indexing geo-point data
          3. Querying geo-point data
            1. Geo distance query
            2. Geo distance range query
            3. Geo bounding box query
              1. Understanding bounding boxes
          4. Sorting by distance
        3. Geo-aggregations
          1. Geo distance aggregation
          2. Using bounding boxes with geo distance aggregation
        4. Geo-shapes
          1. Point
          2. Linestring
          3. Circles
          4. Polygons
          5. Envelops
          6. Mappings geo-shape fields
          7. Indexing geo-shape data
          8. Querying geo-shape data
        5. Summary
      14. 6. Document Relationships in NoSQL World
        1. Relational data in the document-oriented NoSQL world
          1. Managing relational data in Elasticsearch
        2. Working with nested objects
          1. Creating nested mappings
          2. Indexing nested data
          3. Querying nested type data
            1. Nested aggregations
            2. Nested aggregation
              1. Understanding nested aggregation syntax:
            3. Reverse nested aggregation
        3. Parent-child relationships
          1. Creating parent-child mappings
          2. Indexing parent-child documents
          3. Querying parent-child documents
            1. has_child query
            2. has_parent query
        4. Considerations for using document relationships
        5. Summary
      15. 7. Different Methods of Search and Bulk Operations
        1. Introducing search types in Elasticsearch
        2. Cheaper bulk operations
          1. Bulk create
          2. Bulk indexing
          3. Bulk updating
          4. Bulk deleting
        3. Multi get and multi search APIs
          1. Multi get
          2. Multi searches
        4. Data pagination
          1. Pagination with scoring
          2. Pagination without scoring
            1. Scrolling and re-indexing documents using scan-scroll
        5. Practical considerations for bulk processing
        6. Summary
      16. 8. Controlling Relevancy
        1. Introducing relevant searches
          1. The Elasticsearch out-of-the-box tools
            1. An example: why defaults are not enough
          2. Controlling relevancy with custom scoring
          3. The function_score query
            1. weight
            2. field_value_factor
            3. script_score
            4. Decay functions - linear, exp, and gauss
        2. Summary
      17. 9. Cluster Scaling in Production Deployments
        1. Node types in Elasticsearch
          1. Client node
          2. Data node
          3. Master node
        2. Introducing Zen-Discovery
          1. Multicasting discovery
          2. Unicasting discovery
            1. Configuring unicasting discovery
              1. Minimum number of master nodes: preventing split-brain
                1. An initial list of hosts to ping
                2. Ping timeout
        3. Node upgrades without downtime
        4. Upgrading Elasticsearch version
        5. Best Elasticsearch practices in production
        6. Creating a cluster
        7. Scaling your clusters
          1. When to scale
            1. Metrics to watch
              1. CPU utilization
              2. Memory utilization
              3. Disk I/O utilization
              4. Disk low watermark
          2. How to scale
        8. Summary
      18. 10. Backups and Security
        1. Introducing backup and restore mechanisms
          1. Backup using snapshot API
            1. Creating an NFS drive
              1. Configuring the NFS host server
              2. Configuring client machines
            2. Creating a snapshot
              1. Registering the repository path
              2. Registering the shared file system repository in Elasticsearch
              3. Create your first snapshot
              4. Getting snapshot information
              5. Deleting snapshots
          2. Restoring snapshots
            1. Restoring multiple indices
            2. Renaming indices
            3. Partial restore
            4. Changing index settings during restore
            5. Restoring to a different cluster
          3. Manual backups
          4. Manual restoration
        2. Securing Elasticsearch
          1. Setting up basic HTTP authentication
          2. Setting up Nginx
          3. Securing critical access
            1. Restricting DELETE requests
            2. Restricting endpoints
          4. Load balancing using Nginx
        3. Summary
      19. Index