Partitioned data is the fundamental characteristic of a distributed database system. How that partitioning is done can make the difference between a system that can thrive and adapt and one that requires constant triage. In this section we describe a process you can use to ensure that your distributed database falls into the former category.
Many technical writings use the term data fragmentation instead of data partitioning. These terms are interchangeable. Oracle’s documentation and literature prefer the latter term, possibly because data “fragmentation” in Oracle parlance has come to mean a segment that is stored in many noncontiguous extents.
The obvious approach to data partitioning is to locate data where it is used most. While this is certainly a reasonable objective, it is not always simple to realize. For example, there may be multiple sites that emerge as good candidates, owners of existing data may not be willing to relocate it, or other applications may have conflicting requirements—to name a few issues. One way to uncover these issues is to follow a step-by-step methodology that addresses potential problems and that results in a shared knowledge base of who uses data and how changes impact the distributed database.
The methodology we recommend is derived from one that Marie Buretta proposes in her book Data Replication (John Wiley & Sons, 1997). The process consists of the following steps:
Identify users, locations, and activities