O'Reilly logo

Scaling Big Data with Hadoop and Solr by Hrishikesh Karambelkar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Deep dive – shards and indexing data of Apache Solr

We have already understood what sharding is in Chapter 3, Making Big Data Work for Hadoop and Solr. As the data gets populated in Apache Solr, the size of the Solr index grows, given that each Solr index contains many files/documents/records, and it becomes large enough to fit on a single machine. Additionally, with the growth of the indexes, it is possible that the performance of search query can slow down. Single Solr machine also suffers from concurrency issues and low I/O support. This, in turn, demands distributing the index across multiple machines. Solr can run a distributed query across multiple machines aggregating the results into one.

With the release of Solr 4.1, lots of these things ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required