Lexical diversity

Consider a speaker, who uses the term allow multiple times throughout the speech, compared to an another speaker who uses terms allow, concur, acquiesce, accede, and avow for the same word. The latter speech has more lexical diversity than the former. Lexical diversity is widely believed to be an important parameter to rate a document in terms of textual richness and effectiveness.

Lexical diversity, in simple terms, is a measurement of the breadth and variety of vocabulary used in a document. The different measures of lexical diversity are TTR, MSTTR, MATTR, C, R, CTTR, U, S, K, Maas, HD-D, MTLD, and MTLD-MA.

koRpus package in R provides functions to estimate the lexical diversity or complexity.

If N is the total number of tokens ...

Get Mastering Text Mining with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.