Chapter 8

Text Analytics

Text analytics is the use of software to analyze the written word. The focus of this type of analysis is to understand the intent, characteristics, and meaning of the author. Although based in well-established disciplines of linguistics and computational linguistics, statistics, machine learning, and information retrieval (others would also say philosophy, psychology, and other social sciences), this type of analysis applied to traditional business scenarios is relatively recent—making it by definition, heuristic.

Text analytics is distinct from content analytics in that text analytics considers electronic words, sentences, phrases, snippets, fragments, and documents and does not include the inherent analysis of video or audio recordings. Text can of course be extracted from either type of recording and transcribed to electronic text, manually or by using recognition software. And while there are unique attributes distinct to both video and audio such as time spent standing in a bank teller line captured on video, or the length of a pause by a phone caller, this chapter focuses solely on electronic text.

Seldom is textual data independent of structured data. Structured data such as date, time, author, source, title, and so on, is often captured automatically by the operating system or word processing software being used by the text author. This type of automated, structured data capture is common to all computing systems—so the remaining body of these documents ...

Get Heuristics in Analytics: A Practical Perspective of What Influences Our Analytical World now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.