Configuring Solr with Nutch

Apache Solr can easily be configured for use with Nutch. We can perform the following steps to integrate Apache Nutch with Solr:

  1. Create a new core (nutch-example) in Solr by copying the nutch-example folder from the Chapter 7 code that comes with this book.
  2. After creating the new core, we just need to restart the Solr instance.
  3. After we have restarted the Solr instance, let's crawl some data using Nutch and index it into Solr. To do this, we'll navigate to the %NUTCH_HOME% folder and execute the following command:
    $ bin/crawl
    

    After executing the command, we'll see the following output:

    Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds>
     -i|--index Indexes crawl results into a configured indexer ...

Get Apache Solr for Indexing Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.