O'Reilly logo

Statistical Machine Translation by Philipp Koehn

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2

Words, Sentences, Corpora

This chapter is intended for readers who have little or no background in natural language processing. We introduce basic linguistics concepts and explain their relevance to statistical machine translation. Starting with words and their linguistic properties, we move up to sentences, issues of syntax and semantics. We also discuss the role text corpora play in the building of a statistical machine translation system and the basic methods used to obtain and prepare these data sources.

2.1 Words

Intuitively, the basic atomic unit of meaning is a word. For instance, the word house evokes the mental image of a rectangular building with a roof and smoking chimney, which may be surrounded by grass and trees and inhabited ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required