Relation Extraction
Once named entities have been identified in a text, we then want
to extract the relations that exist between them. As indicated earlier,
we will typically be looking for relations between specified types of
named entity. One way of approaching this task is to initially look for
all triples of the form (X, α,
Y), where X and
Y are named entities of the required types, and α
is the string of words that intervenes between X
and Y. We can then use regular expressions to pull
out just those instances of α that express the relation that we are
looking for. The following example searches for strings that contain the
word in. The special regular expression (?!\b.+ing\b)
is a negative lookahead
assertion that allows us to disregard strings such as success
in supervising the transition of, where
in is followed by a gerund.
>>> IN = re.compile(r'.*\bin\b(?!\b.+ing)') >>> for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'): ... for rel in nltk.sem.extract_rels('ORG', 'LOC', doc, ... corpus='ieer', pattern = IN): ... print nltk.sem.show_raw_rtuple(rel) [ORG: 'WHYY'] 'in' [LOC: 'Philadelphia'] [ORG: 'McGlashan & Sarrail'] 'firm in' [LOC: 'San Mateo'] [ORG: 'Freedom Forum'] 'in' [LOC: 'Arlington'] [ORG: 'Brookings Institution'] ', the research group in' [LOC: 'Washington'] [ORG: 'Idealab'] ', a self-described business incubator based in' [LOC: 'Los Angeles'] [ORG: 'Open Text'] ', based in' [LOC: 'Waterloo'] [ORG: 'WGBH'] 'in' [LOC: 'Boston'] [ORG: 'Bastille Opera'] 'in' ...
Get Natural Language Processing with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.