Chapter 1. Introduction

Parallel processing is becoming increasingly important in the world of database computing. These days, databases often grow to enormous sizes and are accessed by larger and larger numbers of users. This growth strains the ability of single-processor and single-computer systems to handle the load. More and more organizations are turning to parallel processing technologies to give them the performance, scalability, and reliability they need. Oracle Corporation is a leader in providing parallel processing technologies in a wide range of products. This chapter provides an overview of parallel processing in general and also describes how parallel processing features are implemented in an Oracle environment.

About Parallel Processing

Parallel processing involves taking a large task, dividing it into several smaller tasks, and then working on each of those smaller tasks simultaneously. The goal of this divide-and-conquer approach is to complete the larger task in less time than it would have taken to do it in one large chunk.

Your local grocery store provides a good, real-life analogy to parallel processing. Your grocer must collect money from customers for the groceries they purchase. He could install just one checkout stand, with one cash register, and force everyone to go through the same line. However, the line would move slowly, people would get fidgety, and some would go elsewhere to shop. To speed up the process, your grocer doubtless uses several checkout stands, each with a cash register of its own. This is parallel processing at work. Instead of checking out one customer at a time, your grocer can now handle several at a time.

In our grocery store analogy, parallel processing required several checkout stands, each with its own cash register. Without trying to push the analogy too far, think of each checkout stand as a computer and each cash register as a processor. In a computing environment, the multiple processors in a parallel processing system may all reside on the same computer, or they may be spread across separate computers. When they are spread across separate computers, each computer is referred to as a node.

There are a few basic requirements of parallel computing:

  • Computer hardware that is designed to work with multiple processors and that provides a means of communication between those processors

  • An operating system that is capable of managing multiple processors

  • Application software that is capable of breaking large tasks into multiple smaller tasks that can be performed in parallel

Weather forecasting provides another real-life example of parallel processing at work. Satellites used for weather forecasting collect millions of bytes of data per second on the condition of earth’s atmosphere, formation of clouds, wind intensity and direction, temperature, and so on. This huge amount of data has to be processed by complex algorithms to arrive at a proper forecast. Thousands of iterations of computation may be needed to interpret this environmental data. Parallel computers are used to perform these computations in a timely manner so a weather forecast can be generated early enough for it to be useful.

Why Parallel Processing?

Why do you need parallel processing? Why not just buy a faster computer? The answers to these questions lie largely in the laws of physics.

Computers were invented to solve problems faster than a human being could. Since day one, people have wanted computers to do more and to do it faster. Vendors responded with improved circuitry design for the processor, improved instruction sets, and improved algorithms to meet the demand for faster response time. Advances in engineering made it possible to add more logic circuits to processors. Processor circuit designs developed from small-scale to medium-scale integration, and then to large-scale and very large-scale integration. Some of today’s processors have billions of transistors in them. The clock cycle of processors has also been reduced over the years. Some of today’s processors have a clock cycle on the order of nanoseconds, and CPU frequencies have crossed the one-gigahertz barrier. All of these advances have led to processors that can do more work faster than ever before.

However, there are physical limitations on this trend of constant improvement. The processing speed of processors depends on the transmission speed of information between the electronic components within the processor. This speed, in turn, is limited by the speed of light, which is 300 mm per nanosecond. But to achieve the speed of light, optical communication methods would have to be used within processors. Therefore, the speed of processors cannot be increased beyond a certain point. Another limiting factor is that the density of the transistors within a processor can be pushed only to a certain limit. Beyond that limit, the transistors create electromagnetic interference for one another.

As improvements in clock cycle and circuitry design reached an optimum level, hardware designers looked for other alternatives to increase performance. Parallelism is the result of those efforts. Parallelism enables multiple processors to work simultaneously on several parts of a task in order to complete it faster than could be done otherwise.

Do You Need Parallel Processing?

Parallel processing not only increases processing power, it also offers several other advantages when it’s implemented properly. These advantages are:

  • Higher throughput

  • More fault tolerance

  • Better price/performance

There are hundreds of applications today that benefit from these advantages.

But parallelism is not the answer for everything. There are some added costs associated with parallelism. Synchronization between parts of a program executed by different processors is an overhead of parallelism that needs to be managed and kept at a minimum. Also, administering a parallel computing environment is more complicated than administering a serial environment.

Applications that already run satisfactorily in a serial environment may not benefit from a switch to a parallel processing environment. In addition, not all problems are amenable to parallel solutions. Unless your application is capable of decomposing large tasks into multiple smaller, parallelizable tasks, parallel processing will be of no benefit.

Parallel processing is useful for only those applications that can break larger tasks into smaller parallel tasks and that can manage the synchronization between those tasks. In addition, there must be a performance gain large enough to justify the overhead of parallelism.

Parallel Hardware Architectures

The subject of parallel computing has attracted attention from scientists and engineers, as well as from commercial vendors. Over the years, several commercially successful parallel hardware platforms have been developed. The most common of these are listed here, and are described in greater detail in Chapter 2.

Symmetric Multiprocessing systems

Symmetric Multiprocessing (SMP) systems have multiple CPUs. The number usually varies from 2 to 64. All of the CPUs in an SMP machine share the same memory, the system bus, and the I/O system. A single copy of the operating system controls all of the CPUs.

Massively Parallel Processing systems

Massively Parallel Processing (MPP) systems consist of several nodes connected together. Each node has its own CPU, memory, bus, disks, and I/O system. Each node runs its own copy of the operating system. The number of nodes in an MPP system can vary from two all the way to several thousand.

Clustered systems

A clustered system consists of several nodes loosely coupled using local area network (LAN) interconnection technology. Each of these nodes can be a single-processor machine or SMP machine. In a cluster, system software balances the workload among the nodes and provides for high availability.

Non Uniform Memory Access systems

Non Uniform Memory Access (NUMA) systems consist of several SMP systems that are interconnected in order to form a larger system. All of the memory in all of the SMP systems are connected together to form a single large memory space. NUMA systems run one copy of the operating system across all nodes.

Get Oracle Parallel Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.