You are previewing Mastering Natural Language Processing with Python.
O'Reilly logo
Mastering Natural Language Processing with Python

Book Description

Maximize your NLP capabilities while creating amazing NLP projects in Python

About This Book

  • Learn to implement various NLP tasks in Python

  • Gain insights into the current and budding research topics of NLP

  • This is a comprehensive step-by-step guide to help students and researchers create their own projects based on real-life applications

  • Who This Book Is For

    This book is for intermediate level developers in NLP with a reasonable knowledge level and understanding of Python.

    What You Will Learn

  • Implement string matching algorithms and normalization techniques

  • Implement statistical language modeling techniques

  • Get an insight into developing a stemmer, lemmatizer, morphological analyzer, and morphological generator

  • Develop a search engine and implement POS tagging concepts and statistical modeling concepts involving the n gram approach

  • Familiarize yourself with concepts such as the Treebank construct, CFG construction, the CYK Chart Parsing algorithm, and the Earley Chart Parsing algorithm

  • Develop an NER-based system and understand and apply the concepts of sentiment analysis

  • Understand and implement the concepts of Information Retrieval and text summarization

  • Develop a Discourse Analysis System and Anaphora Resolution based system

  • In Detail

    Natural Language Processing is one of the fields of computational linguistics and artificial intelligence that is concerned with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning.

    This book will give you expertise on how to employ various NLP tasks in Python, giving you an insight into the best practices when designing and building NLP-based applications using Python. It will help you become an expert in no time and assist you in creating your own NLP projects using NLTK.

    You will sequentially be guided through applying machine learning tools to develop various models. We’ll give you clarity on how to create training data and how to implement major NLP applications such as Named Entity Recognition, Question Answering System, Discourse Analysis, Transliteration, Word Sense disambiguation, Information Retrieval, Sentiment Analysis, Text Summarization, and Anaphora Resolution.

    Style and approach

    This is an easy-to-follow guide, full of hands-on examples of real-world tasks. Each topic is explained and placed in context, and for the more inquisitive, there are more details of the concepts used.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at If you purchased this book elsewhere, you can visit and register to have the code file.

    Table of Contents

    1. Mastering Natural Language Processing with Python
      1. Table of Contents
      2. Mastering Natural Language Processing with Python
      3. Credits
      4. About the Authors
      5. About the Reviewer
        1. eBooks, discount offers, and more
          1. Why subscribe?
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Working with Strings
        1. Tokenization
          1. Tokenization of text into sentences
          2. Tokenization of text in other languages
          3. Tokenization of sentences into words
          4. Tokenization using TreebankWordTokenizer
          5. Tokenization using regular expressions
        2. Normalization
          1. Eliminating punctuation
            1. Conversion into lowercase and uppercase
          2. Dealing with stop words
            1. Calculate stopwords in English
        3. Substituting and correcting tokens
          1. Replacing words using regular expressions
            1. Example of the replacement of a text with another text
          2. Performing substitution before tokenization
          3. Dealing with repeating characters
            1. Example of deleting repeating characters
          4. Replacing a word with its synonym
            1. Example of substituting word a with its synonym
        4. Applying Zipf's law to text
        5. Similarity measures
          1. Applying similarity measures using Ethe edit distance algorithm
          2. Applying similarity measures using Jaccard's Coefficient
          3. Applying similarity measures using the Smith Waterman distance
          4. Other string similarity metrics
        6. Summary
      9. 2. Statistical Language Modeling
        1. Understanding word frequency
          1. Develop MLE for a given text
          2. Hidden Markov Model estimation
        2. Applying smoothing on the MLE model
          1. Add-one smoothing
          2. Good Turing
          3. Kneser Ney estimation
          4. Witten Bell estimation
        3. Develop a back-off mechanism for MLE
        4. Applying interpolation on data to get mix and match
        5. Evaluate a language model through perplexity
        6. Applying metropolis hastings in modeling languages
        7. Applying Gibbs sampling in language processing
        8. Summary
      10. 3. Morphology – Getting Our Feet Wet
        1. Introducing morphology
        2. Understanding stemmer
        3. Understanding lemmatization
        4. Developing a stemmer for non-English language
        5. Morphological analyzer
        6. Morphological generator
        7. Search engine
        8. Summary
      11. 4. Parts-of-Speech Tagging – Identifying Words
        1. Introducing parts-of-speech tagging
          1. Default tagging
        2. Creating POS-tagged corpora
        3. Selecting a machine learning algorithm
        4. Statistical modeling involving the n-gram approach
        5. Developing a chunker using pos-tagged corpora
        6. Summary
      12. 5. Parsing – Analyzing Training Data
        1. Introducing parsing
        2. Treebank construction
        3. Extracting Context Free Grammar (CFG) rules from Treebank
        4. Creating a probabilistic Context Free Grammar from CFG
        5. CYK chart parsing algorithm
        6. Earley chart parsing algorithm
        7. Summary
      13. 6. Semantic Analysis – Meaning Matters
        1. Introducing semantic analysis
          1. Introducing NER
          2. A NER system using Hidden Markov Model
          3. Training NER using Machine Learning Toolkits
          4. NER using POS tagging
        2. Generation of the synset id from Wordnet
        3. Disambiguating senses using Wordnet
        4. Summary
      14. 7. Sentiment Analysis – I Am Happy
        1. Introducing sentiment analysis
          1. Sentiment analysis using NER
          2. Sentiment analysis using machine learning
          3. Evaluation of the NER system
        2. Summary
      15. 8. Information Retrieval – Accessing Information
        1. Introducing information retrieval
          1. Stop word removal
          2. Information retrieval using a vector space model
        2. Vector space scoring and query operator interaction
        3. Developing an IR system using latent semantic indexing
        4. Text summarization
        5. Question-answering system
        6. Summary
      16. 9. Discourse Analysis – Knowing Is Believing
        1. Introducing discourse analysis
          1. Discourse analysis using Centering Theory
          2. Anaphora resolution
        2. Summary
      17. 10. Evaluation of NLP Systems – Analyzing Performance
        1. The need for evaluation of NLP systems
          1. Evaluation of NLP tools (POS taggers, stemmers, and morphological analyzers)
          2. Parser evaluation using gold data
        2. Evaluation of IR system
        3. Metrics for error identification
        4. Metrics based on lexical matching
        5. Metrics based on syntactic matching
        6. Metrics using shallow semantic matching
        7. Summary
      18. Index