Contents

List of Figures

List of Tables

Preface

Acknowledgments

1 Introduction

1.1 Overview of this Book

1.2 Text Mining and Related Fields

1.3 Advice for Reading this Book

2 Text Patterns

2.1 Introduction

2.2 Regular Expressions

2.3 Finding Words in a Text

2.4 Decomposing Poe’s “The Tell-Tale Heart” into Words

2.5 A Simple Concordance

2.6 First Attempt at Extracting Sentences

2.7 Regex Odds and Ends

2.8 References

Problems

3 Quantitative Text Summaries

3.1 Introduction

3.2 Scalars, Interpolation, and Context in Perl

3.3 Arrays and Context in Perl

3.4 Word Lengths in Poe’s “The Tell-Tale Heart”

3.5 Arrays and Functions

3.6 Hashes

3.7 Two Text Applications

3.8 Complex Data Structures

3.9 References

3.10 First Transition

Problems

4 Probability and Text Sampling

4.1 Introduction

4.2 Probability

4.3 Conditional Probability

4.4 Mean and Variance of Random Variables

4.5 The Bag-of-Words Model for Poe’s “The Black Cat”

4.6 The Effect of Sample Size

4.7 References

Problems

5 Applying Information Retrieval to Text Mining

5.1 Introduction

5.2 Counting Letters and Words

5.3 Text Counts and Vectors

5.4 The Term-Document Matrix Applied to Poe

5.5 Matrix Multiplication

5.6 Functions of Counts

5.7 Document Similarity

5.8 References

Problems

6 Concordance Lines and Corpus Linguistics

6.1 Introduction

6.2 Sampling

6.3 Corpus as Baseline

6.4 Concordancing

6.5 Collocations and Concordance Lines

6.6 Applications with References

6.7 Second Transition

Problems

7 Multivariate Techniques with Text

Get Practical Text Mining with Perl now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.