Chapter 6. Understanding metadata

 

 

Conquering your fears of extracting text from files in a few lines of Java code has hopefully put Tika on your personal must-have list. The ease and simplicity with which Tika can turn an afternoon’s parsing work into a smorgasbord of content handler plugins and event-based text processing is likely fresh on your mind. If not, head back to chapter 5 and relive the memories.

Looking ahead, sometimes before you’ve even obtained the textual content within the files you’re interested in, you may be able to weed out which files you’re not interested in, based on a few simple criteria, and save yourself a bunch ...

Get Tika in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.