Receiving Characters

When the parser reads #PCDATA, it passes this text to the characters() method as an array of chars. Although it would be simpler if characters() just took a String as an argument, using a char[] array allows certain performance optimizations. In particular, parsers often store a large chunk of the original document in a single array, and repeatedly pass that same array to the characters() method, while updating the values of start and length.

On the flip side, when there's a large amount of text between two tags with no intervening markup, the parser may choose to call characters() multiple times even though it doesn't need to. Xerces generally won't pass more than 16K of text in one call. Crimson is limited to about 8K of ...

Get Processing XML with Java™: A Guide to SAX, DOM, JDOM, JAXP, and TrAX now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.