Creating POS-tagged corpora

A corpus may be known as a collection of documents. A corpora is the collection of multiple corpus.

Let's see the following code, which will generate a data directory inside the home directory:

>>> import nltk
>>> import os,os.path
>>> create = os.path.expanduser('~/nltkdoc')
>>> if not os.path.exists(create):
  os.mkdir(create)


>>> os.path.exists(create)
True
>>> import nltk.data
>>> create in nltk.data.path
True

This code will create a data directory named ~/nltkdoc inside the home directory. The last line of this code will return True and will ensure that the data directory has been created. If the last line of the code returns False, then it means that the data directory has not been created and we need to create it ...

Get Natural Language Processing: Python and NLTK now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.