Elasticsearch for indexing and searching

Elasticsearch (http://www.elasticsearch.org/) is an Apache 2.0 licensed open source search solution built on top of Apache Lucene. Elasticsearch is a distributed, multi-tenant, and document-oriented search engine. Elasticsearch supports distributed deployments, by breaking down an index into shards and by distributing the shards across the nodes in the cluster. While both Elasticsearch and Apache Solr use Apache Lucene as the core search engine, Elasticsearch aims to provide a more scalable and a distributed solution that is better suited for the cloud environments than Apache Solr.

Getting ready

Install Apache Nutch and crawl some web pages as per the Intradomain web crawling using Apache Nutch or Whole ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.