Processing Binary Files

Problem

Your system distinguishes between text and binary files. How do you?

Solution

Use the binmode function on the filehandle:

binmode(HANDLE);

Discussion

Not everyone agrees what constitutes a line in a text file, because one person’s textual character set is another’s binary gibberish. Even when everyone is using ASCII instead of EBCDIC, Rad50, or Unicode, discrepancies arise.

As mentioned in the Introduction, there is no such thing as a newline character. It is purely virtual, a figment of the operating system, standard libraries, device drivers, and Perl.

Under Unix or Plan9, a "\n" represents the physical sequence "\cJ" (the Perl double-quote escape for Ctrl-J), a linefeed. However, on a terminal that’s not in raw mode, an Enter key generates an incoming "\cM" (a carriage return) which turns into "\cJ", whereas an outgoing "\cJ" turns into "\cM\cJ". This strangeness doesn’t happen with normal files, just terminal devices, and it is handled strictly by the device driver.

On a Mac, a "\n" is usually represented by "\cM"; just to make life interesting (and because the standard requires that "\n" and "\r" be different), a "\r" represents a "\cJ". This is exactly the opposite of the way that Unix, Plan9, VMS, CP/M, or nearly anyone else does it. So, Mac programmers writing files for other systems or talking over a network have to be careful. If you send out "\n", you’ll deliver a "\cM", and no "\cJ" will be seen. Most network services prefer to receive and ...

Get Perl Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.