9.9. Customizing a Tokenizer
You can customize a tokenizer in three ways: by customizing one of the tokenizer's states, by changing which state the tokenizer enters given an initial character, or by adding an entirely new state.
9.9.1. Customizing a State
The preceding section shows how the CoffeeParser class creates a special tokenizer that allows spaces to appear in words. The tokenizer() method of this class retrieves a WordState object from a tokenizer t and updates it:
t.wordState().setWordChars(' ', ' ', true);
9.9.2. Changing Which State the Tokenizer Enters
The example in Section 9.7.1 changes the state the tokenizer enters on seeing a “#” to a quote state. It uses this line:
t.setCharacterState('#', '#', t.quoteState());
9.9.3. Adding ...
Get Building Parsers with Java™ now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.