9.9. Customizing a Tokenizer

You can customize a tokenizer in three ways: by customizing one of the tokenizer's states, by changing which state the tokenizer enters given an initial character, or by adding an entirely new state.

9.9.1. Customizing a State

The preceding section shows how the CoffeeParser class creates a special tokenizer that allows spaces to appear in words. The tokenizer() method of this class retrieves a WordState object from a tokenizer t and updates it:

t.wordState().setWordChars(' ', ' ', true); 

9.9.2. Changing Which State the Tokenizer Enters

The example in Section 9.7.1 changes the state the tokenizer enters on seeing a “#” to a quote state. It uses this line:

t.setCharacterState('#', '#', t.quoteState()); 

9.9.3. Adding ...

Get Building Parsers with Java™ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.