You want to read in a text file that is delimited by commas and new lines (or any other pair of delimiters for that matter). Records are delimited by one character, and fields within a record are delimited by another. For example, a comma-separated text file of employee information may look like the following:
Smith, Bill, 5/1/2002, Active Stanford, John, 4/5/1999, Inactive
Such files are usually interim storage for data sets exported from spreadsheets, databases, or other file formats.
See Example 4-32 for how to do this.
If you read the text into string
s one contiguous chunk
at a time using getline
(the function template defined
in <string>
) you can use the split
function I presented in Recipe 4.6 to parse the text and put it in a
data structure, in this case, a vector
.
Example 4-32. Reading in a delimited file
#include <iostream> #include <fstream> #include <string> #include <vector> using namespace std; void split(const string& s, char c, vector<string>& v) { int i = 0; int j = s.find(c); while (j >= 0) { v.push_back(s.substr(i, j-i)); i = ++j; j = s.find(c, j); if (j < 0) { v.push_back(s.substr(i, s.length())); } } } void loadCSV(istream& in, vector<vector<string>*>& data) { vector<string>* p = NULL; string tmp; while (!in.eof()) { getline(in, tmp, '\n'); // Grab the next line p = new vector<string>(); split(tmp, ',', *p); // Use split from // Recipe 4.7 data.push_back(p); cout << tmp << '\n'; tmp.clear(); } } int main(int argc, char** argv) { if (argc < 2) return(EXIT_FAILURE); ifstream in(argv[1]); if (!in) return(EXIT_FAILURE); vector<vector<string>*> data; loadCSV(in, data); // Go do something useful with the data... for (vector<vector<string>*>::iterator p = data.begin(); p != data.end(); ++p) { delete *p; // Be sure to de- } // reference p! }
There isn’t much in Example 4-32 that
hasn’t been covered already. I discussed getline
in
Recipe 4.19 and vector
s in Recipe
4.3. The only piece worth mentioning has to do with memory allocation.
loadCSV
creates a new vector
for each line of data it reads in and stores it in yet another
vector
of pointers to vector
s. Since the memory for each of these vector
s is allocated on the heap, somebody has to de-allocate it, and that
somebody is you (and not the vector
implementation).
The vector
has no knowledge of whether it contains
a value or a pointer to a value, or anything else. All it knows is that when it’s
destroyed, it needs to call the destructor for each element it contains. If the vector
stores objects, then this is fine; the object is
properly destroyed. But if the vector
contains
pointers, the pointer is destroyed, but not the object it points to.
There are two ways to ensure the memory is freed. First, you can do what I did in Example 4-32 and do it manually yourself, like this:
for (vector<vector<string>*>::iterator p = data.begin(); p != data.end(); ++p) { delete *p; }
Or you can use a reference-counted pointer, such as the Boost project’s smart_ptr
, which will be part of the forthcoming C++0x
standard. But doing so is nontrivial, so I recommend reading up on what a smart_ptr
is and how it works. For more information on Boost
in general, see the homepage at
www.boost.org.
Get C++ Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.