The job of a Scanner or lexical analyzer is identifying low-level language constructs – language atoms like identifiers, keywords, numbers, operators, string literals, etc. As these constructs can be represented as regular languages, a Scanner design is based on regular expressions and uses a finite-state machine as its implementation model.
The basic scanning process is a character-by-character examination of the input source code and identifying the tokens. A real Scanner will have to do several further jobs, first of which is to supply an internal representation of the atoms, called tokens, to the next phase – the parser. Consider a grammar for an arithmetic expression.
Fig. 3.1 Phases of a compiler: Scanner
E −> E + T ...