DataFrames and RDatasets

When dealing with tabulated datasets there are occasions when some of the values are missing. One of the features of statistical languages is that they can handle such situations.

In Julia, the DataFrames package has been developed in order to treat such cases and this is the subject of this chapter.

The DataFrames package

The package extends the Julia base by adding three new types:

  • NA is introduced in order to represent a missing value. This type only has one particular value NA.
  • DataArray is a type that emulates Julia's standard Array type, but is able to store missing values in the array.
  • DataFrame is a type that is capable of representing tabular datasets such as those found in typical databases or spreadsheets. The concept ...

Get Julia: High Performance Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.