Processing huge NumPy arrays with memory mapping

Sometimes, we need to deal with NumPy arrays that are too big to fit in the system memory. A common solution is to use memory mapping and implement out-of-core computations. The array is stored in a file on the hard drive, and we create a memory-mapped object to this file that can be used as a regular NumPy array. Accessing a portion of the array results in the corresponding data being automatically fetched from the hard drive. Therefore, we only consume what we use.

How to do it...

  1. Let's create a memory-mapped array:
    In [1]: import numpy as np
    In [2]: nrows, ncols = 1000000, 100
    In [3]: f = np.memmap('memmapped.dat', dtype=np.float32, 
                          mode='w+', shape=(nrows, ncols))
  2. Let's feed the array with random ...

Get IPython Interactive Computing and Visualization Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.