What’s the Use of Syntax?

Beyond n-grams

We gave an example in Chapter 2 of how to use the frequency information in bigrams to generate text that seems perfectly acceptable for small sequences of words but rapidly degenerates into nonsense. Here’s another pair of examples that we created by computing the bigrams over the text of a children’s story, The Adventures of Buster Brown (included in the Project Gutenberg Selection Corpus):

Example 8-4. 

  1. He roared with me the pail slip down his back

  2. The worst part and clumsy looking for whoever heard light

You intuitively know that these sequences are “word-salad,” but you probably find it hard to pin down what’s wrong with them. One benefit of studying grammar is that it provides a conceptual framework and vocabulary for spelling out these intuitions. Let’s take a closer look at the sequence the worst part and clumsy looking. This looks like a coordinate structure, where two phrases are joined by a coordinating conjunction such as and, but, or or. Here’s an informal (and simplified) statement of how coordination works syntactically:

Coordinate Structure: if v1 and v2 are both phrases of grammatical category X, then v1 and v2 is also a phrase of category X.

Here are a couple of examples. In the first, two NPs (noun phrases) have been conjoined to make an NP, while in the second, two APs (adjective phrases) have been conjoined to make an AP.

Example 8-5. 

  1. The book’s ending was (NP the worst part and the best part) for me.

  2. On land they are (AP slow ...

Get Natural Language Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.