This series will provide an overview and working knowledge of Natural Language Processing (NLP), using Python’s Natural Language Toolkit (NLTK) library within an Anaconda environment. It is intended for users who have basic programming knowledge of Python and want to start with NLP.
The tutorial starts with an introduction to data structures and regular expressions, then progresses to accessing and analyzing text using NLTK, and finally graduating to making predictions on text using Python’s machine learning module, Scikit Learn. Topics covered in this video include:
- Setting up the Environment. After providing an overview to this video series, this clip shows you how to install and run Python, as well as Anaconda and the necessary libraries (including NLTK).
- Manipulating Data. Explores how to manipulate data in Python, using these data structures: strings, lists, tuples, dictionaries, and sets.
- Using Regular Expressions (Regex). Explores using Regular Expressions (Regex) in Python including creating a Regex grammar, using Search and FindAll methods, using special characters in Regex, and applying pattern-matching and string-substitution.
- Accessing Files and Reading Text. Covers the ways of accessing files and reading text, including retrieving directories, reading text (.txt) files, reading MS Word (.docx) documents, reading .pdf files, and reading and accessing NLTK corpora.
- Extracting, Cleaning, and Preprocessing Text, Part 1. Explores extracting, cleaning and preprocessing text, using sentence and word tokenization, bigrams, trigrams, and ngrams, stemming, lemmatization, and stop-word removal.
- Extracting, Cleaning, and Preprocessing Text, Part 2. Covers the process of extracting, cleaning, and preprocessing text, using Part of Speech (POS) tagging, and named entity recognition.
- Analyzing Sentence Structure. Explains how to analyze a sentence structure, including using syntax trees, chunking of words, chinking of words, and context-free grammar (CFG).
- Classifying Text, Part 1. Covers text classification using machine learning, including understanding the concepts of bag of words, CountVectorizer, and Term Frequency - Inverse Document Frequency (TF-IDF).
- Classifying Text, Part 2. Explores text classification using machine learning, including converting text to features and labels, using Multinomial Naïve Bayes Classifier, and leveraging the confusion matrix.
- Putting the Pieces Together: NLP Project on Sentiment Analysis. Implements everything we have learned so far on a data set. This full NLP project summarizes topics discussed in the previous tutorials to create the machine learning classifier in performing sentiment analysis.