Chapter 4 Parsing and Extracting Features

Introduction

Tokens and Words

Lemmatization

POS Tags

Parsing Tree

Text Parsing Node in SAS Text Miner

Stemming and Synonyms

Identifying Parts of Speech

Using Start and Stop Lists

Spell Checking

Entities

Building Custom Entities Using SAS Contextual Extraction Studio

Summary

References

Introduction

In this chapter, we discuss the next step and perhaps the most important step in the text mining process flow—text parsing. In Chapters 2 and 3, we have seen how various methods collect and process textual documents. The next task is to convert the collected text documents (in unstructured form) to a vector representation (a structured form). Fundamentally, parsing is the first step in converting unstructured ...

Get Text Mining and Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.