Hardware Design Criteria

With the functional requirements determined, the next step was to establish design criteria for the SOHO server hardware. Here are the relative priorities we assigned for our SOHO server. Your priorities may differ.

Here’s the breakdown:

Price

Price is moderately important for this system. We don’t want to spend money needlessly, but we will spend what it takes to meet our other criteria.

Reliability

Reliability is the single most important consideration.

Size

Size is unimportant. Our SOHO server will sit in a server closet that has room to spare.

Noise level

Noise level is unimportant, again because the server will reside in a closet, far from our ears.

Expandability

Expandability is moderately important. Our server will initially have four hard drives and an optical drive installed, but we may want to expand the storage subsystem later. Similarly, although we’ll use the embedded video, S-ATA, and network interfaces initially, we may eventually install one or two SCSI host adapters, additional network interfaces, and so on.

Processor performance

Processor performance is relatively unimportant, at least initially. Our SOHO server will run Linux for file and print services, which place little demand on the CPU. However, we expect that the server will eventually run at least some server-based applications, perhaps X11 apps that display on workstations or server-based applications such as mailman or squirrelmail. The incremental cost of installing a moderately fast Pentium 4 is small enough that we’ll do it now and have done with it.

Video performance

Video performance is of no importance. In fact, other than for initial configuration, we’ll probably run our SOHO server headless—that is, we’ll temporarily install a monitor while we install and configure Linux, but subsequently manage the server from a desktop system elsewhere on the network. Our only requirement for the video adapter is that it run X, because we prefer to use the Linux GUI installation utilities rather than doing everything at the command line. (Yes, yes, we know. Real Men don’t run X on servers. What can we say? We’re wimps.)

Disk capacity/performance

Disk capacity and performance are secondary only to reliability. Back when our old server had only one ATA drive and Robert happened to be doing something very disk intensive on the server, Barbara sometimes used to shout, “What are you doing to the server?” as she waited (and waited) for a document to load on her system. We found that mirroring a pair of 7,200 RPM ATA disks killed two birds with one stone. It mostly solved the disk contention problem—having two spindles speeds up reads considerably—and also provided disk redundancy. If one drive failed, the other contained everything written to the mirror set right up to the instant of failure. Based on our own experience, we decided that although a mirrored pair of ATA drives would probably suffice, we might just as well install a full ATA RAID and have done with it.

RAID for SOHO servers

RAID is an acronym for Redundant Array of Inexpensive Disks. A RAID stores data on two or more physical hard drives, thereby reducing the risk of losing data when a drive fails. Some types of RAID also increase read and/or write performance relative to a single drive.

Five levels of RAID are defined, RAID 1 through RAID 5. RAID levels are optimized to have different strengths, including level of redundancy, optimum file size, random versus sequential read performance, and random versus sequential write performance. RAID 1 and RAID 5 are commonly used in PC servers; RAID 3 is used rarely; And RAID 2 and RAID 4 are almost never used. The RAID levels typically used on SOHO servers are:

RAID 1

RAID 1 uses two drives that contain exactly the same data. Every time the system writes to the array, it writes identical data to each drive. If one drive fails, the data can be read from the surviving drive. Because data must be written twice, RAID 1 writes are a bit slower than writes to a single drive. Because data can be read from either drive in a RAID 1, reads are somewhat faster. RAID 1 is also called mirroring, if both drives share one controller, or duplexing, if each drive has its own controller.

RAID 1 provides very high redundancy, but is the least efficient of the RAID levels in terms of hard drive usage. For example, with two 160 GB hard drives in a RAID 1 array, only 160 GB of total disk space is visible to the system. RAID 1 may be implemented with a physical RAID 1 controller or in software by the operating system. Some motherboards have embedded ATA or S-ATA interfaces that offer native RAID 1 support.

RAID 5

RAID 5 uses three or more physical hard drives. The RAID 5 controller divides data that is to be written to the array into blocks and calculates parity blocks for the data. Data blocks and parity blocks are interleaved on each physical drive, so each of the three or more drives in the array contains both data blocks and parity blocks. If any one drive in the RAID 5 fails, the data blocks contained on the failed drive can be recreated from the parity data stored on the surviving drives.

RAID 5 is optimized for the type of disk usage common in an office environment—many random reads, and fewer random writes of relatively small files. RAID 5 reads are faster than those from a single drive because RAID 5 has three spindles spinning and delivering data simultaneously. RAID 5 writes are also typically a bit faster than single-drive writes. RAID 5 uses hard drive space more efficiently than RAID 1.

In effect, although RAID 5 uses distributed parity, a RAID 5 array can be thought of as dedicating one of its physical drives to parity data. For example, with three 160 GB drives in a RAID 5 array, 320 GB—the capacity of two of the three drives—is visible to the system. With RAID 5 and four 160 GB drives, 480 GB—the capacity of three of the four drives—is visible to the system. RAID 5 may be implemented with a physical RAID 5 controller or in software by the operating system. Few motherboards have embedded RAID 5 support.

RAID 3

RAID 3 uses three or more physical hard drives. One drive is dedicated to storing parity data, and user data is distributed among the other drives. RAID 3 is the least common RAID level used for PC servers because its characteristics are not optimal for the disk usage patterns typical of small-office LANs. RAID 3 is optimized for sequential reads of very large files, and so is used primarily for applications such as streaming video.

Then there is the so-called RAID 0, which isn’t really RAID at all because it provides no redundancy:

RAID 0

RAID 0, also called striping, uses two physical hard drives. Data written to the array is divided into blocks, which are written to each drive in an alternating manner. For example, if you write a 256 KB file to a RAID 0 that uses 64 KB blocks, the first 64 KB block may be written to the first drive in the RAID 0. The second 64 KB block is written to the second drive, the third 64 KB block to the first drive, and the final 64 KB block to the second drive. The file itself exists only as fragments distributed across both physical drives, so if either drive fails all data on the array is lost. That means that data stored on a RAID 0 is more at risk than data stored on a single drive, so in that sense a RAID 0 can actually be thought of as less redundant than the zero redundancy of a single drive. RAID 0 is used because it provides the fastest possible disk performance. Reads and writes are very fast, because they can use the combined bandwidth of two drives.

Finally, there is stacked RAID, which is an “array of arrays” rather than an array of disks. Stacked RAID can be thought of as an array that replaces individual physical disks with subarrays. The advantage of stacked RAID is that it combines the advantages of two RAID levels. The disadvantage is that it requires a lot of physical hard drives.

Stacked RAID

The most common stacked RAID used in PC servers is referred to as RAID 0+1, RAID 1+0, or RAID 10. A RAID 0+1 uses four physical drives arranged as two RAID 1 arrays of two drives each. Each RAID 1 array would normally appear to the system as a single drive, but RAID 0+1 takes things a step further by creating a RAID 0 array from the two RAID 1 arrays. For example, a RAID 0+1 with four 160 GB drives comprises two RAID 1 arrays, each with two 160 GB drives. Each RAID 1 is visible to the system as a single 160 GB drive. Those two RAID 1 arrays are then combined into one RAID 0 array, which is visible to the system as a single 320 GB RAID 0. Because the system “sees” a RAID 0, performance is very high. Because the RAID 0 components are actually RAID 1 arrays, the data is very well protected. If any single drive in the RAID 0+1 array fails, the array continues to function, although redundancy is lost until the drive is replaced and the array is rebuilt.

Until a few years ago RAID 0+1 was uncommon on small servers because it required SCSI drives and host adapters, and therefore cost thousands of dollars to implement. Nowadays, thanks to inexpensive ATA drives, the incremental cost of RAID 0+1 is very small. Instead of buying one $200 ATA hard drive for your small server, you can buy four $100 ATA hard drives and a $50 ATA RAID adapter. You may not even need to buy the ATA RAID adapter because some motherboards include native RAID 0+1 support. Data protection doesn’t come much cheaper than that.

Get Building the Perfect PC now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.