:type lineslines :: String -> [String]
lines "line 1\nline 2"["line 1","line 2"]
lines looks useful, it relies on us reading a
file in “text mode” in order to work. Text mode is a feature common to many
programming languages; it provides a special behavior when we read and
write files on Windows. When we read a file in text mode, the file I/O
library translates the line-ending sequence
(carriage return followed by newline) to
"\n" (newline alone), and it does the
reverse when we write a file. On Unix-like systems, text mode does not
perform any translation. As a result of this difference, if we read a
file on one platform that was written on the other, the line endings are
likely to become a mess. (Both
writeFile operate in text mode.)
lines function splits only on newline
characters, leaving carriage returns dangling at the ends of lines. If
we read a Windows-generated text file on a Linux or Unix box, we’ll get
trailing carriage returns at the end of each line.
We have comfortably used Python’s “universal newline” support for years; this transparently handles Unix and Windows line-ending conventions for us. We would like to provide something similar in Haskell.
-- file: ch04/SplitLines.hs splitLines :: String -> [String]
Our function’s type signature indicates that it accepts a single string, the contents of a file with some unknown line-ending convention. It returns a list of strings, representing each line from the file:
-- file: ch04/SplitLines.hs splitLines  =  splitLines cs = let (pre, suf) = break isLineTerminator cs in pre : case suf of ('\r':'\n':rest) -> splitLines rest ('\r':rest) -> splitLines rest ('\n':rest) -> splitLines rest _ ->  isLineTerminator c = c == '\r' || c == '\n'
Before we dive into detail, notice first
how we organized our code. We presented the important pieces of code
first, keeping the definition of
isLineTerminator until later. Because we have
given the helper function a readable name, we can guess what it does
even before we’ve read it, which eases the smooth “flow” of
reading the code.
Prelude defines a
break that we can
use to partition a list into two parts. It takes a function as its first
parameter. That function must examine an element of the list and return
a Bool to indicate whether to break the list at that point.
break function returns a pair, which consists of the sublist consumed
before the predicate returned
prefix) and the rest of the list (the
break odd [2,4,5,6,8]([2,4],[5,6,8])
break isUpper "isUpper"("is","Upper")
In the second equation, we first apply
break to our input string. The
prefix is the substring before a line terminator, and the suffix is the
remainder of the string. The suffix will include the line terminator, if
any is present.
pre : expression tells us
that we should add the
pre value to the front of the
list of lines. We then use a
expression to inspect the suffix, so we can decide what to do next. The
result of the
case expression will be
used as the second argument to the
The first pattern matches a string that begins with a
carriage return, followed by a newline. The variable
rest is bound to the remainder of the string. The
other patterns are similar, so they ought to be easy to follow.
A prose description of a Haskell function isn’t necessarily easy to follow. We can gain a better understanding by stepping into ghci and observing the behavior of the function in different circumstances.
break isLineTerminator "foo"("foo","")
break isLineTerminator "foo\r\nbar"("foo","\r\nbar")
Because the suffix begins with a carriage return
followed by a newline, we match on the first branch of the
case expression. This gives us
pre bound to
suf bound to
"bar". We apply
splitLines recursively, this time on
"foo" : ["bar"]["foo","bar"]
This sort of experimenting with ghci is a helpful way to understand and debug the behavior of a piece of code. It has an even more important benefit that is almost accidental in nature. It can be tricky to test complicated code from ghci, so we will tend to write smaller functions, which can further help the readability of our code.
Let’s hook our
splitLines function into the little
framework that we wrote earlier. Make a copy of the InteractWith.hs source file; let’s call the
new file FixLines.hs. Add the
splitLines function to the new
source file. Since our function must produce a single
String, we must stitch the list of lines back together.
Prelude provides an
unlines function that concatenates a list of strings, adding a newline
to the end of each:
-- file: ch04/SplitLines.hs fixLines :: String -> String fixLines input = unlines (splitLines input)
ghc --make FixLines[1 of 1] Compiling Main ( FixLines.hs, FixLines.o ) Linking FixLines ...
If you are on a Windows system, find and download a
text file that was created on a Unix system (for example, gpl-3.0.txt [http://www.gnu.org/licenses/gpl-3.0.txt]). Open it in
the standard Notepad text editor. The lines should all run together,
making the file almost unreadable. Process the file using the
FixLines command you just created, and open
the output file in Notepad. The line endings should now be fixed
On Unix-like systems, the standard pagers and editors
hide Windows line endings, making it more difficult to verify that
FixLines is actually eliminating
them. Here are a few commands that should help:
file gpl-3.0.txtgpl-3.0.txt: ASCII English text
unix2dos gpl-3.0.txtunix2dos: converting file gpl-3.0.txt to DOS format ...
file gpl-3.0.txtgpl-3.0.txt: ASCII English text, with CRLF line terminators