12.4. How to Process Every Character in a Text File

Problem

You want to open a text file and process every character in the file.

Solution

If performance isn’t a concern, write your code in a straightforward, obvious way:

val source = io.Source.fromFile("/Users/Al/.bash_profile")
for (char <- source) {
  println(char.toUpper)
}
source.close

However, be aware that this code may be slow on large files. For instance, the following method that counts the number of lines in a file takes 100 seconds to run on an Apache access logfile that is ten million lines long:

// run time: took 100 secs
def countLines1(source: io.Source): Long = {
  val NEWLINE = 10
  var newlineCount = 0L
  for {
    char <- source
    if char.toByte == NEWLINE
  } newlineCount += 1
  newlineCount
}

The time can be significantly reduced by using the getLines method to retrieve one line at a time, and then working through the characters in each line. The following line-counting algorithm counts the same ten million lines in just 23 seconds:

// run time: 23 seconds
// use getLines, then count the newline characters
// (redundant for this purpose, i know)
def countLines2(source: io.Source): Long = {
  val NEWLINE = 10
  var newlineCount = 0L
  for {
    line <- source.getLines
    c <- line
    if c.toByte == NEWLINE
  } newlineCount += 1
  newlineCount
}

Both algorithms work through each byte in the file, but by using getLines in the second algorithm, the run time is reduced dramatically.

Note

Notice that there’s the equivalent of two for loops in the second example. ...

Get Scala Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.