Parsing Comma-Separated Data

Problem

You have a string or a file of lines containing comma-separated values (CSV) that you need to read in. Many MS-Windows-based spreadsheets and some databases use CSV to export data.

Solution

Use my CSV class or a regular expression (see Chapter 4).

Discussion

CSV is deceptive. It looks simple at first glance, but the values may be quoted or unquoted. If quoted, they may further contain escaped quotes. This far exceeds the capabilities of the StringTokenizer class (Section 3.3). Either considerable Java coding or the use of regular expressions is required. I’ll show both ways.

First, a Java program. Assume for now that we have a class called CSV that has a no-argument constructor, and a method called parse( ) that takes a string representing one line of the input file. The parse( ) method returns a list of fields. For flexibility, this list is returned as an Iterator (see Section 7.5). I simply use the Iterator’s hasNext( ) method to control the loop, and its next( ) method to get the next object.

import java.util.*;

/* Simple demo of CSV parser class.
 */
public class CSVSimple {    
    public static void main(String[] args) {
        CSV parser = new CSV(  );
        Iterator it = parser.parse(
            "\"LU\",86.25,\"11/4/1998\",\"2:19PM\",+4.0625");
        while (it.hasNext(  )) {
            System.out.println(it.next(  ));
        }
    }
}

After the quotes are escaped, the string being parsed is actually the following:

"LU",86.25,"11/4/1998","2:19PM",+4.0625

Running CSVSimple yields the following output:

> java ...

Get Java Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.