Reading Data from Files with Non-standard Formats Using scan

The scan function is very flexible, but as a consequence of this, it is much harder to use than read.table. This example uses the US murder data. The filename comes first, in the usual format (enclosed in double quotes and using paired backslashes to separate the drive name from the folder name and the folder name from the file name). Then comes skip=1 because the first line of the file contains the variable names (as indicated by header=T in a read.table function). Next comes what, which is a list of length the number of variables (the number of columns to be read; 4 in this case) specifying their type (character "" in this case):

murders<-scan("c:\\temp\\murders.txt", skip=1, what=list("","","",""))

Read 50 records

The object produced by scan is a list rather than a dataframe as you can see from

class(murders)

[1]  "list"

It is simple to convert the list to a dataframe using the as.data.frame function

murder.frame<-as.data.frame(murders)

You are likely to want to use the variables names from the file as variable names in the dataframe. To do this, read just the first line of the file using scan with nlines=1:

murder.names<-
scan("c:\\temp\\murders.txt",nlines=1,what="character",quiet=T)
murder.names

[1] "state" "population" "murder" "region"

Note the use of quiet=T to switch off the report of how many records were read. Now give these names to the columns of the dataframe

names(murder.frame)<-murder.names

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.