## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

# 4.17. Counting the Number of Characters, Words, and Lines in a Text File

## Problem

You have to count the numbers of characters, words, and lines—or some other type of text element—in a text file.

## Solution

Use an input stream to read the characters in, one at a time, and increment local statistics as you encounter characters, words, and line breaks. Example 4-26 contains the function `countStuff`, which does exactly that.

Example 4-26. Calculating statistics about a text file

```#include <iostream>
#include <fstream>
#include <cstdlib>
#include <cctype>

using namespace std;

void countStuff(istream& in,
int& chars,
int& words,
int& lines) {

char cur = '\0';
char last = '\0';
chars = words = lines = 0;

while (in.get(cur)) {
if (cur == '\n' ||
(cur == '\f' && last == '\r'))
lines++;
else
chars++;
if (!std::isalnum(cur) &&   // This is the end of a
std::isalnum(last))     // word
words++;
last = cur;
}
if (chars > 0) {               // Adjust word and line
if (std::isalnum(last))     // counts for special
words++;                 // case
lines++;
}
}

int main(int argc, char** argv) {

if (argc < 2)
return(EXIT_FAILURE);

ifstream in(argv[1]);

if (!in)
exit(EXIT_FAILURE);

int c, w, l;

countStuff(in, c, w, l);
1
cout << "chars: " << c << '\n';
cout << "words: " << w << '\n';
cout << "lines: " << l << '\n';
}```

## Discussion

The algorithm here is straightforward. Characters are easy: increment the character count each time you call `get` on the input stream. Lines are only slightly more difficult, since the way a line ends depends on the operating system. Thankfully, it's usually either a new-line character (`\n`) or a carriage return line feed sequence (`\r\l`). By keeping track of the current and last characters, you can easily capture occurrences of this sequence. Words are easy or hard, depending on your definition of a word.

For Example 4-26, I consider a word to be a contiguous sequence of alphanumeric characters. As I look at each character in the input stream, when I encounter a nonalphanumeric character, I look at the previous character to see if it was alphanumeric. If it was, then a word has just ended and I can increment the word count. I can tell if a character is alphanumeric by using `isalnum` from `<cctype>`. But that's not all—you can test characters for a number of different qualities with similar functions. See Table 4-3 for the functions you can use to test character qualities. For wide characters, use the functions of the same name but with a "w" after the "is," e.g., `iswspace`. The wide-character versions are declared in the header `<cwctype>`.

Table 4-3. Character test functions from <cctype> and <cwctype>

Function

Description

`isalphaiswalpha`

Alpha characters: a-z, A-Z (upper- or lowercase).

`isupperiswupper`

Alpha characters in uppercase only: A-Z.

`isloweriswlower`

Alpha characters in lowercase only: a-z.

`isdigitiswdigit`

Numeric characters: 0-9.

`isxdigitiswxdigit`

Hexadecimal numeric characters: 0-9, a-f, A-F.

`isspaceiswspace`

Whitespace characters: ' `, \n, \t, \v, \r, \l.

`iscntrliswcntrl`

Control characters: ASCII 0-31 and 127.

`ispunctiswpunct`

Punctuation characters that don't belong to the previous groups.

`isalnumiswalnum`

`isalpha` or `isdigit` is true.

`isprintiswprint`

Printable ASCII characters.

`isgraphiswgraph`

`isalpha` or `isdigit` or `ispunct` is true.

After all characters have been read in and the end of the stream has been reached, there is a bit of adjustment to do. First, the loop only counts line breaks, and not, strictly speaking, lines. Therefore, it will always be one less than the actual number of lines. To make this problem go away I just increment the line count by one if there are more than zero characters in the file. Second, if the stream ends with an alphanumeric character, the test for the end of the last word will never occur because I can't test the next character. To account for this, I check if the last character in the stream is alphanumeric (also only when there are more than zero characters in the file) and increment the word count by one.

The technique in Example 4-26 of using streams is nearly identical to that described in Recipe 4.14 and Recipe 4.15, but simpler since it's just inspecting the file and not making any changes.