O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Text Processing using NLTK in Python

Video Description

Learn the tricks and tips that will help you design Text Analytics solutions

About This Video

  • Independent solutions that will teach you how to efficiently perform Natural Language Processing in Python
  • Use dictionaries to create your own named entities using this easy-to-follow guide
  • Learn how to implement NLTK for various scenarios with the help of example-rich solutions to take you beyond basic Natural Language Processing

In Detail

Natural Language Processing (NLP) is a feature of Artificial Intelligence concerned with the interactions between computers and human (natural) languages. This course includes unique videos that will teach you various aspects of performing Natural Language Processing with NLTK—the leading Python platform for the task.

In this course, you will learn what WordNet is and explore its features and usage. It will teach how to extract raw text from web sources and introduce some critical pre-processing steps. You will also get familiarized with the concept of pattern matching as a way to do text analysis.

By the end of the course, you will be confident & have covered various solutions, covering natural language understanding, Natural Language Processing, and syntactic analysis.

All the code and supporting files for this course are available on Github at https://github.com/PacktPublishing/Text-Processing-using-NLTK-in-Python

Table of Contents

  1. Chapter 1 : Corpus and WordNet
    1. The Course Overview 00:03:23
    2. Accessing In-Built Corpora 00:04:07
    3. Downloading an External Corpus 00:03:33
    4. Counting All the wh-words 00:03:43
    5. Frequency Distribution Operations 00:02:40
    6. WordNet 00:03:10
    7. The Concepts of Hyponyms and Hypernyms Using WordNet 00:03:40
    8. Compute the Average Polysemy According to WordNet 00:03:29
  2. Chapter 2 : Raw Text, Sourcing, and Normalization
    1. The Importance of String Operations 00:03:10
    2. Getting Deeper with String Operations 00:02:58
    3. Reading a PDF File in Python 00:02:54
    4. Reading Word Documents in Python 00:03:56
    5. Creating a User-Defined Corpus 00:04:30
    6. Reading Contents from an RSS Feed 00:02:50
    7. HTML Parsing Using BeautifulSoup 00:03:50
  3. Chapter 3 : Pre-Processing
    1. Tokenization – Learning to Use the Inbuilt Tokenizers of NLTK 00:02:52
    2. Stemming – Learning to Use the Inbuilt Stemmers of NLTK 00:02:28
    3. Lemmatization – Learning to Use the WordNetLemmatizer of NLTK 00:02:20
    4. Stopwords – Learning to Use the Stopwords Corpus 00:03:14
    5. Edit Distance – Writing Your Own Algorithm to Find Edit Distance Between Two Strings 00:02:49
    6. Processing Two Short Stories and Extracting the Common Vocabulary 00:02:38
  4. Chapter 4 : Regular Expressions
    1. Regular Expression – Learning to Use *, +, and ? 00:03:24
    2. Regular Expression – Learning to Use Non-Start and Non-End of Word 00:03:20
    3. Searching Multiple Literal Strings and Substrings Occurrences 00:01:54
    4. Creating Date Regex 00:02:41
    5. Making Abbreviations 00:01:19
    6. Learning to Write Your Own Regex Tokenizer 00:01:22
    7. Learning to Write Your Own Regex Stemmer 00:02:14