O'Reilly logo

Tika in Action by Jukka Zitting, Chris Mattmann

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Document type detection

 

This chapter covers

  • Introduction to MIME types
  • Working with MIME types in Tika
  • Identifying file formats

 

Let’s talk about taxonomy. Taxonomy is the science of classification. Taxonomies are used to identify and classify concepts in order to better understand them and to have a shared vocabulary for describing things. For example, the Linnaean taxonomy[1] is the classical system of naming all biological organisms using two-part Latin names that identify both the genus or category and the specific species within that category. The term Homo sapiens identifies the modern human species as a part of the family of earlier human-like species, along with the extinct Homo neanderthalensis. A similar taxonomy, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required