Chapter 5. Natural Language Tools

Sean Burke, author of Perl and LWP and a professional linguist, once described artificial intelligence as the study of programming situations where you either don’t know what you want or don’t know how to get it.

Natural-language processing, or NLP, is the application of AI techniques to answer questions about text written in a human language: what does it mean, what other documents is it like, and so on. As Perl is often described as a text-processing language, it shouldn’t be much of a surprise to find that there are a great many modules and techniques available in Perl for carrying out NLP-related tasks.

But as we’ve seen so far in this book, the real strength of Perl is not in the ease with which we can program particular techniques, but that so many of the techniques we need—techniques to break texts into sentences and words, to correctly strip the endings off inflected words, to put the right endings back on again, and so on—have already been implemented and placed on CPAN. So in this chapter we’re going to take a tour of the natural language section of CPAN, and see how we can use its modules to slice and dice any language text we need to deal with.

Perl and Natural Languages

There’s an especially good reason why Perl is used for handling natural language problems—Perl was created with natural languages in mind. In fact, Perl’s creator, Larry Wall, has a joint degree in natural and artificial languages and sees Perl as influenced by both branches ...

Get Advanced Perl Programming, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.