Foreword

I’m a big fan of search engines and Java, so early in the year 2004 I was looking for a good Java-based open source project on search engines. I quickly discovered Nutch. Nutch is an open source search engine project from the Apache Software Foundation. It was initiated by Doug Cutting, the well-known father of Lucene.

With my new toy on my laptop, I tested and tried to evaluate it. Even if Nutch was in its early stages, it was a promising project—exactly what I was looking for. I proposed my first patches to Nutch relating to language identification in early 2005. Then, in the middle of 2005 I become a Nutch committer and increased my number of contributions relating to language identification, content-type guessing, and document ...

Get Tika in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.