Chapter 4. Document type detection

 

This chapter covers

  • Introduction to MIME types
  • Working with MIME types in Tika
  • Identifying file formats

 

Let’s talk about taxonomy. Taxonomy is the science of classification. Taxonomies are used to identify and classify concepts in order to better understand them and to have a shared vocabulary for describing things. For example, the Linnaean taxonomy[1] is the classical system of naming all biological organisms using two-part Latin names that identify both the genus or category and the specific species within that category. The term Homo sapiens identifies the modern human species as a part of the family of earlier human-like species, along with the extinct Homo neanderthalensis. A similar taxonomy, ...

Get Tika in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.