You are previewing Natural Language Processing: Python and NLTK.
O'Reilly logo
Natural Language Processing: Python and NLTK

Book Description

Learn to build expert NLP and machine learning projects using NLTK and other Python libraries

About This Book

  • Break text down into its component parts for spelling correction, feature extraction, and phrase transformation

  • Work through NLP concepts with simple and easy-to-follow programming recipes

  • Gain insights into the current and budding research topics of NLP

  • Who This Book Is For

    If you are an NLP or machine learning enthusiast and an intermediate Python programmer who wants to quickly master NLTK for natural language processing, then this Learning Path will do you a lot of good. Students of linguistics and semantic/sentiment analysis professionals will find it invaluable.

    What You Will Learn

  • The scope of natural language complexity and how they are processed by machines

  • Clean and wrangle text using tokenization and chunking to help you process data better

  • Tokenize text into sentences and sentences into words

  • Classify text and perform sentiment analysis

  • Implement string matching algorithms and normalization techniques

  • Understand and implement the concepts of information retrieval and text summarization

  • Find out how to implement various NLP tasks in Python

  • In Detail

    Natural Language Processing is a field of computational linguistics and artificial intelligence that deals with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning. The number of human-computer interaction instances are increasing so it’s becoming imperative that computers comprehend all major natural languages.

    The first NLTK Essentials module is an introduction on how to build systems around NLP, with a focus on how to create a customized tokenizer and parser from scratch. You will learn essential concepts of NLP, be given practical insight into open source tool and libraries available in Python, shown how to analyze social media sites, and be given tools to deal with large scale text. This module also provides a workaround using some of the amazing capabilities of Python libraries such as NLTK, scikit-learn, pandas, and NumPy.

    The second Python 3 Text Processing with NLTK 3 Cookbook module teaches you the essential techniques of text and language processing with simple, straightforward examples. This includes organizing text corpora, creating your own custom corpus, text classification with a focus on sentiment analysis, and distributed text processing methods.

    The third Mastering Natural Language Processing with Python module will help you become an expert and assist you in creating your own NLP projects using NLTK. You will be guided through model development with machine learning tools, shown how to create training data, and given insight into the best practices for designing and building NLP-based applications using Python.

    This Learning Path combines some of the best that Packt has to offer in one complete, curated package and is designed to help you quickly learn text processing with Python and NLTK. It includes content from the following Packt products:

  • NTLK essentials by Nitin Hardeniya

  • Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins

  • Mastering Natural Language Processing with Python by Deepti Chopra, Nisheeth Joshi, and Iti Mathur

  • Style and approach

    This comprehensive course creates a smooth learning path that teaches you how to get started with Natural Language Processing using Python and NLTK. You’ll learn to create effective NLP and machine learning projects using Python and NLTK.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

    Table of Contents

    1. Natural Language Processing: Python and NLTK
      1. Table of Contents
      2. Natural Language Processing: Python and NLTK
      3. Natural Language Processing: Python and NLTK
      4. Credits
      5. Preface
        1. What this learning path covers
        2. What you need for this learning path
        3. Who this learning path is for
        4. Reader feedback
        5. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      6. 1. Module 1
        1. 1. Introduction to Natural Language Processing
          1. Why learn NLP?
          2. Let's start playing with Python!
            1. Lists
            2. Helping yourself
            3. Regular expressions
            4. Dictionaries
            5. Writing functions
          3. Diving into NLTK
          4. Your turn
          5. Summary
        2. 2. Text Wrangling and Cleansing
          1. What is text wrangling?
          2. Text cleansing
          3. Sentence splitter
          4. Tokenization
          5. Stemming
          6. Lemmatization
          7. Stop word removal
          8. Rare word removal
          9. Spell correction
          10. Your turn
          11. Summary
        3. 3. Part of Speech Tagging
          1. What is Part of speech tagging
            1. Stanford tagger
            2. Diving deep into a tagger
            3. Sequential tagger
              1. N-gram tagger
              2. Regex tagger
            4. Brill tagger
            5. Machine learning based tagger
          2. Named Entity Recognition (NER)
            1. NER tagger
          3. Your Turn
          4. Summary
        4. 4. Parsing Structure in Text
          1. Shallow versus deep parsing
          2. The two approaches in parsing
          3. Why we need parsing
          4. Different types of parsers
            1. A recursive descent parser
            2. A shift-reduce parser
            3. A chart parser
            4. A regex parser
          5. Dependency parsing
          6. Chunking
          7. Information extraction
            1. Named-entity recognition (NER)
            2. Relation extraction
          8. Summary
        5. 5. NLP Applications
          1. Building your first NLP application
          2. Other NLP applications
            1. Machine translation
            2. Statistical machine translation
            3. Information retrieval
              1. Boolean retrieval
              2. Vector space model
              3. The probabilistic model
            4. Speech recognition
            5. Text classification
            6. Information extraction
            7. Question answering systems
            8. Dialog systems
            9. Word sense disambiguation
            10. Topic modeling
            11. Language detection
            12. Optical character recognition
          3. Summary
        6. 6. Text Classification
          1. Machine learning
          2. Text classification
          3. Sampling
            1. Naive Bayes
            2. Decision trees
            3. Stochastic gradient descent
            4. Logistic regression
            5. Support vector machines
          4. The Random forest algorithm
          5. Text clustering
            1. K-means
          6. Topic modeling in text
            1. Installing gensim
          7. References
          8. Summary
        7. 7. Web Crawling
          1. Web crawlers
          2. Writing your first crawler
          3. Data flow in Scrapy
            1. The Scrapy shell
            2. Items
          4. The Sitemap spider
          5. The item pipeline
          6. External references
          7. Summary
        8. 8. Using NLTK with Other Python Libraries
          1. NumPy
            1. ndarray
              1. Indexing
            2. Basic operations
            3. Extracting data from an array
            4. Complex matrix operations
              1. Reshaping and stacking
              2. Random numbers
          2. SciPy
            1. Linear algebra
            2. eigenvalues and eigenvectors
            3. The sparse matrix
            4. Optimization
          3. pandas
            1. Reading data
            2. Series data
            3. Column transformation
            4. Noisy data
          4. matplotlib
            1. Subplot
            2. Adding an axis
            3. A scatter plot
            4. A bar plot
            5. 3D plots
          5. External references
          6. Summary
        9. 9. Social Media Mining in Python
          1. Data collection
            1. Twitter
          2. Data extraction
            1. Trending topics
          3. Geovisualization
            1. Influencers detection
            2. Facebook
            3. Influencer friends
          4. Summary
        10. 10. Text Mining at Scale
          1. Different ways of using Python on Hadoop
            1. Python streaming
            2. Hive/Pig UDF
            3. Streaming wrappers
          2. NLTK on Hadoop
            1. A UDF
            2. Python streaming
          3. Scikit-learn on Hadoop
          4. PySpark
          5. Summary
      7. 2. Module 2
        1. 1. Tokenizing Text and WordNet Basics
          1. Introduction
          2. Tokenizing text into sentences
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Tokenizing sentences in other languages
            5. See also
          3. Tokenizing sentences into words
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Separating contractions
              2. PunktWordTokenizer
              3. WordPunctTokenizer
            4. See also
          4. Tokenizing sentences using regular expressions
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Simple whitespace tokenizer
            5. See also
          5. Training a sentence tokenizer
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          6. Filtering stopwords in a tokenized sentence
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          7. Looking up Synsets for a word in WordNet
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Working with hypernyms
              2. Part of speech (POS)
            5. See also
          8. Looking up lemmas and synonyms in WordNet
            1. How to do it...
            2. How it works...
            3. There's more...
              1. All possible synonyms
              2. Antonyms
            4. See also
          9. Calculating WordNet Synset similarity
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Comparing verbs
              2. Path and Leacock Chordorow (LCH) similarity
            4. See also
          10. Discovering word collocations
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Scoring functions
              2. Scoring ngrams
            5. See also
        2. 2. Replacing and Correcting Words
          1. Introduction
          2. Stemming words
            1. How to do it...
            2. How it works...
            3. There's more...
              1. The LancasterStemmer class
              2. The RegexpStemmer class
              3. The SnowballStemmer class
            4. See also
          3. Lemmatizing words with WordNet
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Combining stemming with lemmatization
            5. See also
          4. Replacing words matching regular expressions
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Replacement before tokenization
            5. See also
          5. Removing repeating characters
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          6. Spelling correction with Enchant
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. The en_GB dictionary
              2. Personal word lists
            5. See also
          7. Replacing synonyms
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. CSV synonym replacement
              2. YAML synonym replacement
            5. See also
          8. Replacing negations with antonyms
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
        3. 3. Creating Custom Corpora
          1. Introduction
          2. Setting up a custom corpus
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Loading a YAML file
            5. See also
          3. Creating a wordlist corpus
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Names wordlist corpus
              2. English words corpus
            5. See also
          4. Creating a part-of-speech tagged word corpus
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Customizing the word tokenizer
              2. Customizing the sentence tokenizer
              3. Customizing the paragraph block reader
              4. Customizing the tag separator
              5. Converting tags to a universal tagset
            5. See also
          5. Creating a chunked phrase corpus
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Tree leaves
              2. Treebank chunk corpus
              3. CoNLL2000 corpus
            5. See also
          6. Creating a categorized text corpus
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Category file
              2. Categorized tagged corpus reader
              3. Categorized corpora
            5. See also
          7. Creating a categorized chunk corpus reader
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Categorized CoNLL chunk corpus reader
            5. See also
          8. Lazy corpus loading
            1. How to do it...
            2. How it works...
            3. There's more...
          9. Creating a custom corpus view
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Block reader functions
              2. Pickle corpus view
              3. Concatenated corpus view
            4. See also
          10. Creating a MongoDB-backed corpus reader
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          11. Corpus editing with file locking
            1. Getting ready
            2. How to do it...
            3. How it works...
        4. 4. Part-of-speech Tagging
          1. Introduction
          2. Default tagging
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Evaluating accuracy
              2. Tagging sentences
              3. Untagging a tagged sentence
            5. See also
          3. Training a unigram part-of-speech tagger
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Overriding the context model
              2. Minimum frequency cutoff
            4. See also
          4. Combining taggers with backoff tagging
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Saving and loading a trained tagger with pickle
            4. See also
          5. Training and combining ngram taggers
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Quadgram tagger
            5. See also
          6. Creating a model of likely word tags
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
          7. Tagging with regular expressions
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          8. Affix tagging
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Working with min_stem_length
            4. See also
          9. Training a Brill tagger
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Tracing
            4. See also
          10. Training the TnT tagger
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Controlling the beam search
              2. Significance of capitalization
            4. See also
          11. Using WordNet for tagging
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. See also
          12. Tagging proper names
            1. How to do it...
            2. How it works...
            3. See also
          13. Classifier-based tagging
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Detecting features with a custom feature detector
              2. Setting a cutoff probability
              3. Using a pre-trained classifier
            4. See also
          14. Training a tagger with NLTK-Trainer
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Saving a pickled tagger
              2. Training on a custom corpus
              3. Training with universal tags
              4. Analyzing a tagger against a tagged corpus
              5. Analyzing a tagged corpus
            4. See also
        5. 5. Extracting Chunks
          1. Introduction
          2. Chunking and chinking with regular expressions
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Parsing different chunk types
              2. Parsing alternative patterns
              3. Chunk rule with context
            5. See also
          3. Merging and splitting chunks with regular expressions
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Specifying rule descriptions
            4. See also
          4. Expanding and removing chunks with regular expressions
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
          5. Partial parsing with regular expressions
            1. How to do it...
            2. How it works...
            3. There's more...
              1. The ChunkScore metrics
              2. Looping and tracing chunk rules
            4. See also
          6. Training a tagger-based chunker
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Using different taggers
            4. See also
          7. Classification-based chunking
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Using a different classifier builder
            4. See also
          8. Extracting named entities
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Binary named entity extraction
            4. See also
          9. Extracting proper noun chunks
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
          10. Extracting location chunks
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
          11. Training a named entity chunker
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
          12. Training a chunker with NLTK-Trainer
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Saving a pickled chunker
              2. Training a named entity chunker
              3. Training on a custom corpus
              4. Training on parse trees
              5. Analyzing a chunker against a chunked corpus
              6. Analyzing a chunked corpus
            4. See also
        6. 6. Transforming Chunks and Trees
          1. Introduction
          2. Filtering insignificant words from a sentence
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          3. Correcting verb forms
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. See also
          4. Swapping verb phrases
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
          5. Swapping noun cardinals
            1. How to do it...
            2. How it works...
            3. See also
          6. Swapping infinitive phrases
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
          7. Singularizing plural nouns
            1. How to do it...
            2. How it works...
            3. See also
          8. Chaining chunk transformations
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
          9. Converting a chunk tree to text
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
          10. Flattening a deep tree
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. The cess_esp and cess_cat treebank
            5. See also
          11. Creating a shallow tree
            1. How to do it...
            2. How it works...
            3. See also
          12. Converting tree labels
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. See also
        7. 7. Text Classification
          1. Introduction
          2. Bag of words feature extraction
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Filtering stopwords
              2. Including significant bigrams
            4. See also
          3. Training a Naive Bayes classifier
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Classification probability
              2. Most informative features
              3. Training estimator
              4. Manual training
            5. See also
          4. Training a decision tree classifier
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Controlling uncertainty with entropy_cutoff
              2. Controlling tree depth with depth_cutoff
              3. Controlling decisions with support_cutoff
            4. See also
          5. Training a maximum entropy classifier
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Megam algorithm
            5. See also
          6. Training scikit-learn classifiers
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Comparing Naive Bayes algorithms
              2. Training with logistic regression
              3. Training with LinearSVC
            5. See also
          7. Measuring precision and recall of a classifier
            1. How to do it...
            2. How it works...
            3. There's more...
              1. F-measure
            4. See also
          8. Calculating high information words
            1. How to do it...
            2. How it works...
            3. There's more...
              1. The MaxentClassifier class with high information words
              2. The DecisionTreeClassifier class with high information words
              3. The SklearnClassifier class with high information words
            4. See also
          9. Combining classifiers with voting
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. See also
          10. Classifying with multiple binary classifiers
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          11. Training a classifier with NLTK-Trainer
            1. How to do it...
            2. How it works...
            3. There's more...
              1. Saving a pickled classifier
              2. Using different training instances
              3. The most informative features
              4. The Maxent and LogisticRegression classifiers
              5. SVMs
              6. Combining classifiers
              7. High information words and bigrams
              8. Cross-fold validation
              9. Analyzing a classifier
            4. See also
        8. 8. Distributed Processing and Handling Large Datasets
          1. Introduction
          2. Distributed tagging with execnet
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Creating multiple channels
              2. Local versus remote gateways
            5. See also
          3. Distributed chunking with execnet
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Python subprocesses
            5. See also
          4. Parallel list processing with execnet
            1. How to do it...
            2. How it works...
            3. There's more...
            4. See also
          5. Storing a frequency distribution in Redis
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          6. Storing a conditional frequency distribution in Redis
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          7. Storing an ordered dictionary in Redis
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          8. Distributed word scoring with Redis and execnet
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
        9. 9. Parsing Specific Data Types
          1. Introduction
          2. Parsing dates and times with dateutil
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          3. Timezone lookup and conversion
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Local timezone
              2. Custom offsets
            5. See also
          4. Extracting URLs from HTML with lxml
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Extracting links directly
              2. Parsing HTML from URLs or files
              3. Extracting links with XPaths
            5. See also
          5. Cleaning and stripping HTML
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
            5. See also
          6. Converting HTML entities with BeautifulSoup
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Extracting URLs with BeautifulSoup
            5. See also
          7. Detecting and converting character encodings
            1. Getting ready
            2. How to do it...
            3. How it works...
            4. There's more...
              1. Converting to ASCII
              2. UnicodeDammit conversion
            5. See also
        10. A. Penn Treebank Part-of-speech Tags
      8. 3. Module 3
        1. 1. Working with Strings
          1. Tokenization
            1. Tokenization of text into sentences
            2. Tokenization of text in other languages
            3. Tokenization of sentences into words
            4. Tokenization using TreebankWordTokenizer
            5. Tokenization using regular expressions
          2. Normalization
            1. Eliminating punctuation
              1. Conversion into lowercase and uppercase
            2. Dealing with stop words
              1. Calculate stopwords in English
          3. Substituting and correcting tokens
            1. Replacing words using regular expressions
              1. Example of the replacement of a text with another text
            2. Performing substitution before tokenization
            3. Dealing with repeating characters
              1. Example of deleting repeating characters
            4. Replacing a word with its synonym
              1. Example of substituting word a with its synonym
          4. Applying Zipf's law to text
          5. Similarity measures
            1. Applying similarity measures using Ethe edit distance algorithm
            2. Applying similarity measures using Jaccard's Coefficient
            3. Applying similarity measures using the Smith Waterman distance
            4. Other string similarity metrics
          6. Summary
        2. 2. Statistical Language Modeling
          1. Understanding word frequency
            1. Develop MLE for a given text
            2. Hidden Markov Model estimation
          2. Applying smoothing on the MLE model
            1. Add-one smoothing
            2. Good Turing
            3. Kneser Ney estimation
            4. Witten Bell estimation
          3. Develop a back-off mechanism for MLE
          4. Applying interpolation on data to get mix and match
          5. Evaluate a language model through perplexity
          6. Applying metropolis hastings in modeling languages
          7. Applying Gibbs sampling in language processing
          8. Summary
        3. 3. Morphology – Getting Our Feet Wet
          1. Introducing morphology
          2. Understanding stemmer
          3. Understanding lemmatization
          4. Developing a stemmer for non-English language
          5. Morphological analyzer
          6. Morphological generator
          7. Search engine
          8. Summary
        4. 4. Parts-of-Speech Tagging – Identifying Words
          1. Introducing parts-of-speech tagging
            1. Default tagging
          2. Creating POS-tagged corpora
          3. Selecting a machine learning algorithm
          4. Statistical modeling involving the n-gram approach
          5. Developing a chunker using pos-tagged corpora
          6. Summary
        5. 5. Parsing – Analyzing Training Data
          1. Introducing parsing
          2. Treebank construction
          3. Extracting Context Free Grammar (CFG) rules from Treebank
          4. Creating a probabilistic Context Free Grammar from CFG
          5. CYK chart parsing algorithm
          6. Earley chart parsing algorithm
          7. Summary
        6. 6. Semantic Analysis – Meaning Matters
          1. Introducing semantic analysis
            1. Introducing NER
            2. A NER system using Hidden Markov Model
            3. Training NER using Machine Learning Toolkits
            4. NER using POS tagging
          2. Generation of the synset id from Wordnet
          3. Disambiguating senses using Wordnet
          4. Summary
        7. 7. Sentiment Analysis – I Am Happy
          1. Introducing sentiment analysis
            1. Sentiment analysis using NER
            2. Sentiment analysis using machine learning
            3. Evaluation of the NER system
          2. Summary
        8. 8. Information Retrieval – Accessing Information
          1. Introducing information retrieval
            1. Stop word removal
            2. Information retrieval using a vector space model
          2. Vector space scoring and query operator interaction
          3. Developing an IR system using latent semantic indexing
          4. Text summarization
          5. Question-answering system
          6. Summary
        9. 9. Discourse Analysis – Knowing Is Believing
          1. Introducing discourse analysis
            1. Discourse analysis using Centering Theory
            2. Anaphora resolution
          2. Summary
        10. 10. Evaluation of NLP Systems – Analyzing Performance
          1. The need for evaluation of NLP systems
            1. Evaluation of NLP tools (POS taggers, stemmers, and morphological analyzers)
            2. Parser evaluation using gold data
          2. Evaluation of IR system
          3. Metrics for error identification
          4. Metrics based on lexical matching
          5. Metrics based on syntactic matching
          6. Metrics using shallow semantic matching
          7. Summary
      9. B. Bibliography
      10. Index