Chapter 8

Automatic Language Identification

M. Zampieri*,     Saarland University, Saarbrücken, Germany German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany

Abstract

Automatic language identification or simply language identification is the task of automatically identifying the language(s) contained in a given document. It is an important part of many text processing pipelines including text mining applications. This chapter provides a concise overview on language identification research from early approaches to state-of-the-art methods.

Keywords

Language identification

Text classification

n-grams

Acknowledgements

The author would like to thank Binyam Gebrekidan Gebre and Nikola Ljubešić for commenting on a draft ...

Get Working with Text now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.