Virtual Machine (VM) Cluster

Cluster can be very expensive to build. It consists of many computer nodes, network switches, internet cable and other hardware equipment. It also requires installation of operating systems and managment softwares to be functioning. Dedicated administrating staff are also needed to maintain the whole system.

All of above make clusters in-accessible to students and individual researchers. Usually, having access to a real cluster is desirable but not a must for many research programs. At the early stages of many research projects, we usually pilot our method on a small data set or a subset sampled from a big data set. The purpose is to have a preliminary sense of the data at hand and test if the methods could work.

In these cases, a large scale computing environment cannot be fully utilized. It would require a lot of overhead to get access to a cluster even before we can compute a mean for our data.

Rather, we could use our controllable resources, i.e. personal computers, for emulating such a distributed computing environment. A good way to realize such an emulated environment is to build a local cluster consisting of virtual machines (VMs).

... Some introduction to the background on virtual machines ...

In this chapter, we demonstrate how to use Oracle VM VirtualBox to build a virtual cluster. In the following example, we will build a cluster with three Ubuntu nodes. The operating system can be any linux distribution (or even windows). Though we only create three nodes, more nodes can be added if your personal computer is powerful enough.

... Some discussion on the pros and cons of different linux distributions, such as Ubuntu, Redhat, CentOS, etc. There are also some packaged CD-ROM for easy installation, e.g. ROCKS[^rocks_cluster_url]
rocks_cluster_url. http://www.rocksclusters.org/wordpress/

The demonstration in this chapter is performed on my Macbook Pro (late 2011) with OS X Yosemite (10.10.5). The hardware configuration of the laptop is listed below:

  • CPU: 2.3 GHz Intel Core i5
  • RAM: 16G DDR3 1333 MHz
  • Disk: 512G SSD
  • Graphics: Intel HD Graphics 3000 512 MB

1. Creating One VM Node

2. Creating Additional VM Nodes

3. Connecting VM Nodes as a Cluster

TODO: Docker?