Pavan Balaji, Qian Zhu, and Wu-Chun Feng
General-purpose graphics processing units (GPGPUs or GPUs) are becoming increasingly popular as accelerator devices for core computational kernels in scientific and enterprise computing applications. The advent of programming models such as NVIDIA's CUDA , AMD/ATI's Brook + , and Open Computing Language (OpenCL)  has further accelerated the adoption of GPUs by allowing many applications and high-level libraries to be ported to them [4‒7]. While GPUs have heavily proliferated into high-end computing systems, current programming models require each computational node to be equipped with one or more local GPUs, and application executions are tightly coupled to the physical GPU hardware. Thus, any changes to the hardware (e.g., if it needs to be taken down for maintenance) require the application to stall.
Recent developments in virtualization techniques, on the other hand, have advocated decoupling the application view of “local hardware resources” (such as processors and storage) from the physical hardware itself; that is, each application (or user) gets a “virtual independent view” of a potentially shared set of physical resources. Such decoupling has many advantages, including ease of management, ability to hot-swap the available physical resources on demand, improved resource utilization, and fault tolerance.
For GPUs, virtualization ...