Using the Solr 1045 patch – map-side indexing

The Apache Solr 1045 patch provides Solr users a way to build Solr indexes using the MapReduce framework of Apache Hadoop. Once created, this index can be pushed to Solr storage. The following diagram depicts the mapper and reducer in Hadoop:

Using the Solr 1045 patch – map-side indexing

Each Apache Hadoop mapper transforms input records into a set of (key-value) pairs, which then gets transformed into SolrInputDocument. The Mapper task ends up creating an index from SolrInputDocument.

The focus of reducer is to perform de-duplication of different indexes and merge them if needed. Once the indexes are created, you can load them on your Solr instance ...

Get Scaling Apache Solr now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.