Let’s start off by looking at a simple context-free grammar
(CFG). By convention, the lefthand side of the first production is the
start-symbol of the grammar,
S, and all well-formed
trees must have this symbol as their root label. In NLTK, context-free
grammars are defined in the
nltk.grammar module. In Example 8-9 we define a
grammar and show how to parse a simple sentence admitted by the
Example 8-9. A simple context-free grammar.
grammar1 = nltk.parse_cfg(""" S -> NP VP VP -> V NP | V NP PP PP -> P NP V -> "saw" | "ate" | "walked" NP -> "John" | "Mary" | "Bob" | Det N | Det N PP Det -> "a" | "an" | "the" | "my" N -> "man" | "dog" | "cat" | "telescope" | "park" P -> "in" | "on" | "by" | "with" """)
>>> sent = "Mary saw Bob".split() >>> rd_parser = nltk.RecursiveDescentParser(grammar1) >>> for tree in rd_parser.nbest_parse(sent): ... print tree (S (NP Mary) (VP (V saw) (NP Bob)))
The grammar in Example 8-9 contains productions involving various syntactic categories, as laid out in Table 8-1. The recursive descent parser used here can also be inspected via a graphical interface, as illustrated in Figure 8-3; we discuss this parser in more detail in Parsing with Context-Free Grammar.
Table 8-1. Syntactic categories
the man walked
saw a park
with a telescope
A production like
VP -> V NP | V NP
PP has a disjunction ...