Preface

Lex and yacc are tools designed for writers of compilers and interpreters, although they are also useful for many applications that will interest the noncompiler writer. Any application that looks for patterns in its input, or has an input or command language is a good candidate for lex and yacc. Furthermore, they allow for rapid application prototyping, easy modification, and simple maintenance of programs. To stimulate your imagination, here are a few things people have used lex and yacc to develop:

  • The desktop calculator bc

  • The tools eqn and pic, typesetting preprocessors for mathematical equations and complex pictures.

  • PCC, the Portable C Compiler used with many UNIX systems, and GCC, the GNU C Compiler

  • A menu compiler

  • A SQL data base language syntax checker

  • The lex program itself

What’s New in the Second Edition

We have made extensive revisions in this new second edition. Major changes include:

  • Completely rewritten introductory Chapters 13

  • New Chapter 5 with a full SQL grammar

  • New, much more extensive reference chapters

  • Full coverage of all major MS-DOS and UNIX versions of lex and yacc, including AT&T lex and yacc, Berkeley yacc, flex, GNU bison, MKS lex and yacc, and Abraxas PCYACC

  • Coverage of the new POSIX 1003.2 standard versions of lex and yacc

Scope of This Book

Chapter 1, Lex and Yacc , gives an overview of how and why lex and yacc are used to create compilers and interpreters, and demonstrates some small lex and yacc applications. It also introduces basic terms we use throughout the book.

Chapter 2, Using Lex , describes how to use lex. It develops lex applications that count words in files, analyze program command switches and arguments, and compute statistics on C programs.

Chapter 3, Using Yacc , gives a full example using lex and yacc to develop a fully functional desktop calculator.

Chapter 4, A Menu Generation Language , demonstrates how to use lex and yacc to develop a menu generator.

Chapter 5, Parsing SQL , develops a parser for the full SQL relational data base language. First we use the parser as a syntax checker, then extend it into a simple preprocessor for SQL embedded in C programs.

Chapter 6, A Reference for Lex Specifications , and Chapter 7, A Reference for Yacc Grammars , provide detailed descriptions of the features and options available to the lex and yacc programmer. These chapters and the two that follow provide technical information for the now experienced lex and yacc programmer to use while developing new lex and yacc applications.

Chapter 8, Yacc Ambiguities and Conflicts , explains yacc ambiguities and conflicts, which are problems that keep yacc from parsing a grammar correctly. It then develops methods that can be used to locate and correct such problems.

Chapter 9, Error Reporting and Recovery , discusses techniques that the compiler or interpreter designer can use to locate, recognize, and report errors in the compiler input.

Appendix A, AT&T Lex , describes the command-line syntax of AT&T lex and the error messages it reports and suggests possible solutions.

Appendix B, AT&T Yacc , describes the command-line syntax of AT&T yacc and lists errors reported by yacc. It provides examples of code which can cause such errors and suggests possible solutions.

Appendix C, Berkeley Yacc , describes the command-line syntax of Berkeley yacc, a widely used free version of yacc distributed with Berkeley UNIX, and lists errors reported by Berkeley yacc with suggested solutions.

Appendix D, GNU Bison , discusses differences found in bison, the Free Software Foundation’s implementation of yacc.

Appendix E, Flex , discusses flex, a widely used free version of lex, lists differences from other versions, and lists errors reported by flex with suggested solutions.

Appendix F, MKS Lex and Yacc , discusses the MS-DOS and OS/2 version of lex and yacc from Mortice Kern Systems.

Appendix G, Abraxas Lex and Yacc , discusses PCYACC, the MS-DOS and OS/2 versions of lex and yacc from Abraxas Software.

Appendix H, POSIX Lex and Yacc , discusses the versions of lex and yacc defined by the IEEE POSIX 1003.2 standard.

Appendix I, MGL Compiler Code , provides the complete source code for the menu generation language compiler discussed in Chapter 4.

Appendix J, SQL Parser Code , provides the complete source code and a cross-reference for the SQL parser discussed in Chapter 5.

The Glossary lists technical terms language and compiler theory.

The Bibliography lists other documentation on lex and yacc, as well as helpful books on compiler design.

We presume the reader is familiar with C, as most examples are in C, lex, or yacc, with the remainder being in the special purpose languages developed within the text.

Availability of Lex and Yacc

Lex and yacc were both developed at Bell Laboratories in the 1970s. Yacc was the first of the two, developed by Stephen C. Johnson. Lex was designed by Mike Lesk and Eric Schmidt to work with yacc. Both lex and yacc have been standard UNIX utilities since 7th Edition UNIX. System V and older versions of BSD use the original AT&T versions, while the newest version of BSD uses flex (see below) and Berkeley yacc. The articles written by the developers remain the primary source of information on lex and yacc.

The GNU Project of the Free Software Foundation distributes bison, a yacc replacement; bison was written by Robert Corbett and Richard Stallman. The bison manual, written by Charles Donnelly and Richard Stallman, is excellent, especially for referencing specific features. Appendix D discusses bison.

BSD and GNU Project also distribute flex (Fast Lexical Analyzer Generator), “a rewrite of lex intended to right some of that tool’s deficiencies,” according to its reference page. Flex was originally written by Jef Poskanzer; Vern Paxson and Van Jacobson have considerably improved it and Vern currently maintains it. Appendix E covers topics specific to flex.

There are at least two versions of lex and yacc available for MS-DOS and OS/2 machines. MKS (Mortice Kern Systems Inc.), publishers of the MKS Toolkit, offers lex and yacc as a separate product that supports many PC C compilers. MKS lex and yacc comes with a very good manual. Appendix F covers MKS lex and yacc. Abraxas Software publishes PCYACC, a version of lex and yacc which comes with sample parsers for a dozen widely used programming languages. Appendix G covers Abraxas’ version lex and yacc.

Sample Programs

The programs in this book are available free from UUNET (that is, free except for UUNET’s usual connect-time charges). If you have access to UUNET, you can retrieve the source code using UUCP or FTP. For UUCP, find a machine with direct access to UUNET, and type the following command:

    uucp uunet\!~/nutshell/lexyacc/progs.tar.Z yourhost\!˜/yourname/

The backslashes can be omitted if you use the Bourne shell (sh) instead of the C shell (csh). The file should appear some time later (up to a day or more) in the directory /usr/spool/uucppublic/ yourname. If you don’t have an account but would like one so that you can get electronic mail, then contact UUNET at 703-204-8000.

To use ftp, find a machine with direct access to the Internet. Here is a sample session, with commands in boldface.

    % ftp ftp.oreilly.com
    Connected to ftp.oreilly.com.
    220 FTP server (Version 5.99 Wed May 23 14:40:19 EDT 1990) ready.
    Name (ftp.oreilly.com:yourname): anonymous
    331 Guest login ok, send ident as password.
    Password: ambar@ora.com 
            (use your user name and host here)
    230 Guest login ok, access restrictions apply.
    ftp> cd published/oreilly/nutshell/lexyacc
    250 CWD command successful.
    ftp> binary (you must specify binary transfer for compressed files)
    200 Type set to I.
    ftp> get progs.tar.Z
    200 PORT command successful.
    150 Opening BINARY mode data connection for progs.tar.Z.
    226 Transfer complete.
    ftp> quit
    221 Goodbye.
    %

The file is a compressed tar archive. To extract files once you have retrieved the archive, type:

    % zcat  progs.tar.Z | tar xf -

System V systems require the following tar command instead:

    % zcat progs.tar.Z | tar xof -

Conventions Used in This Handbook

The following conventions are used in this book:

Bold

is used for statements and functions, identifiers, and program names.

Italic

is used for file, directory, and command names when they appear in the body of a paragraph as well as for data types and to emphasize new terms and concepts when they are introduced.

Constant Width

is used in examples to show the contents of files or the output from commands.

Constant Bold

is used in examples to show command lines and options that you type literally.

Quotes

are used to identify a code fragment in explanatory text. System messages, signs, and symbols are quoted as well.

%

is the Shell prompt.

[]

surround optional elements in a description of program syntax. (Don’t type the brackets themselves.)

Acknowledgments

This first edition of this book began with Tony Mason’s MGL and SGL compilers. Tony developed most of the material in this book, working with Dale Dougherty to make it a “Nutshell.” Doug Brown contributed Chapter 8, Yacc Ambiguities and Conflicts . Dale also edited and revised portions of the book. Tim O’Reilly made it a better book by withholding his editorial blessing until he found what he was looking for in the book. Thanks to Butch Anton, Ed Engler, and Mike Loukides for their comments on technical content. Thanks also to John W. Lockhart for reading a draft with an eye for stylistic issues. And thanks to Chris Reilley for his work on the graphics. Finally, Ruth Terry brought the book into print with her usual diligence and her sharp eye for every editorial detail. Though she was trying to work odd hours to also care for her family, it seemed she was caring for this book all hours of the day.

For the second edition, Tony rewrote chapters 1 and 2, and Doug updated Chapter 8. John Levine wrote Chapters 3, 5, 6, 7, and most of the appendices, and edited the rest of the text. Thanks to the technical reviewers, Bill Burke, Warren Carithers, Jon Mauney, Gary Merrill, Eugene Miya, Andy Oram, Bill Torcaso, and particularly Vern Paxson whose detailed page-by-page suggestions made the fine points much clearer. Margaret Levine Young’s blue pencil (which was actually pink) tightened up the text and gave the book editorial consistency. She also compiled most of the index. Chris Reilly again did the graphics, and Donna Woonteiler did the final editing and shepherded the book through the production process.

Get lex & yacc, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.