O'Reilly logo

R for Data Science by Dan Toomey

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Text Mining

A large amount of data available is in the form of text, and it is unstructured, massive, and of tremendous variety. In this chapter, we will have a look at the tools available in R to extract useful information from text.

This chapter describes different ways of mining text. We will cover the following topics:

  • Examining the text in various ways
    • Converting text to lowercase
    • Removing punctuation
    • Removing numbers
    • Removing URLs
    • Removing stop words
    • Using the stems of words rather than instances
    • Building a document matrix delineating uses
  • XML processing, both orthogonal and of varying degrees
  • Examples

Packages

While the standard R system has a number of features and functions available, one of the better aspects of R is the use of packages ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required