A lexeme is a sequence of characters in the source program that matches the pattern for a token. We can say that a token has a pattern and a pattern can be matched by many lexemes, in some cases. As a result, in a programming language, there are an infinite number of potential lexemes and a limited number of tokens.
The easiest way to understand the difference between a lexeme and a token is to take a look at an example, such as the following code snippet:
while (y >= t) y = y - 3;
The preceding code snippet will be parsed into the following lexemes and tokens:
Lexeme |
Token |
while |
WhileKeyword |
( |
OpenParenToken |
y |
Identifier |
>= |
GreaterThanEqualsToken |
t |
Identifier |
) |
CloseParenToken ... |