Following Statistical Patterns

Suppose that you are building a table of customers and populating it with names that will be used later in searches. How would you generate the values? For example, should the values be random collections of characters like the following?

    -32nr -32nr3121ne -e21e
    323-=11r- r
    0-vmdw-dwv0-[o- rr0-32r2 0
    r4i32r -rm32r3p=x ewifef-432fr32o3-==

I got these values by merely pecking randomly at the keyboard. But it’s unlikely that these values would serve well as sample customer names. After all, names have some acceptable variation (such as Jim or Jane), and when you develop a table of test data, it’s important that those values reflect possible data as much as possible. Your values should be from a set of known values, but should occur randomly within that set, without following a predictable pattern.

Another aspect to consider is the variance . Very few names are unique. For instance, in the United States, you will find frequent occurrences of the male names John, James, or Scott (but few cases of names like Arup). So when you generate test data, you usually want to make sure that the distribution is random but also that it follows a real-world statistical model. For instance, let’s say that in our population the distribution of first names should look like the following:

10%

Alan

10%

Barbara

5%

Charles

5%

David

15%

Ellen

20%

Frank

10%

George

5%

Hillary

10%

Iris

10%

Josh

When populating the column FIRST_NAME ...

Get Oracle PL/SQL for DBAs now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.