The Budget

As an example of this process, I'm going to use U.S. federal government budget authorization data, which the Office of Management and Budget (OMB) publishes in a variety of equivalent flat formats, even though the data itself is relatively unflat. This is a good example of the sort of legacy data developers often have to deal with. The complete document [http://w3.access.gpo.gov/usbudget/fFY2002/db.html] consists of 3,185 line items. Each line item consists of 43 separate fields. In the comma-separated values (CSV) version of the file, a typical line item looks like this:

 "418","National Endowment for the Humanities","00","National Endow- ment for the Humanities","0200","National Endowment for the Humani- ties: grants and administration","59","503","Research ...

Get Processing XML with Java™: A Guide to SAX, DOM, JDOM, JAXP, and TrAX now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.