Appendix A

Overview of Perl for Text Mining

This appendix summarizes the basics of Perl in these areas: basic data structures, operators, branching and looping, functions, and regular expressions. The focus is on Perl’s text capabilities, and many references are made to code throughout this book.

The form of these code samples is slightly different than the ones in this book. To save space, the output is placed at the end of the computer code.

To run Perl, first download it by going to http://www.pen.org/ [45] and following the instructions there. Second, type the statements into a file with the suffix .p1, for example, call it program.p1. Third, you need to find out how to use your computer’s command line interface, which allows the typing of commands for execution. Fourth, type the statement below on the command line and then press the enter key. The output will appear below it.

perl program.pl

Remember that Perl is case sensitive. For example, commands have to be in lowercase, and the three variables $cat, $Cat, and $CAT are all distinct. Finally, do not forget to use semicolons to end each statement.

A.1 BASIC DATA STRUCTURES

A programmer must be able to store and modify information, which is kept in scalar, array, and hash variables. We start with scalars, which store a single value, and their names always start with a dollar sign. First, consider the examples in code sample A.1, which demonstrates Perl’s two types of scalars, strings and numbers. If a string is used as a ...

Get Practical Text Mining with Perl now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.