Criteria for Choosing a Sorting Algorithm

To choose a sorting algorithm, consider the qualitative criteria in Table 4-6. These may help your initial decision, but you likely will need more quantitative measures to guide you.

Table 4-6. Criteria for choosing a sorting algorithm

Criteria

Sorting algorithm

Only a few items

Insertion Sort

Items are mostly sorted already

Insertion Sort

Concerned about worst-case scenarios

Heap Sort

Interested in a good average-case result

Quicksort

Items are drawn from a dense universe

Bucket Sort

Desire to write as little code as possible

Insertion Sort

To choose the appropriate algorithm for different data, you need to know some properties about your input data. We created several benchmark data sets on which to show how the algorithms presented in this chapter compare with one another. Note that the actual values of the generated tables are less important because they reflect the specific hardware on which the benchmarks were run. Instead, you should pay attention to the relative performance of the algorithms on the corresponding data sets:

Random strings

Throughout this chapter, we have demonstrated performance of sorting algorithms when sorting 26-character strings that are permutations of the letters in the alphabet. Given there are n! such strings, or roughly 4.03*1026 strings, there are few duplicate strings in our sample data sets. In addition, the cost of comparing elements is not constant, because of the occasional need to compare multiple characters.

Double precision floating-point values

Using available pseudorandom generators available on most operating systems, we generate a set of random numbers from the range [0,1). There are essentially no duplicate values in the sample data set and the cost of comparing two elements is a fixed constant.

The input data provided to the sorting algorithms can be preprocessed to ensure some of the following properties (not all are compatible):

Sorted

The input elements can be presorted into ascending order (the ultimate goal) or in descending order.

Killer median-of-three

Musser (1997) discovered an ordering that ensures that Quicksort requires O(n2) comparisons when using median-of-three to choose a pivot.

Nearly sorted

Given a set of sorted data, we can select k pairs of elements to swap and the distance d with which to swap (or 0 if any two pairs can be swapped). Using this capability, you can construct input sets that might more closely match your input set.

The upcoming tables are ordered left to right, based upon how well the algorithms perform on the final row in the table. Each section has four tables, showing performance results under the four different situations outlined earlier in this chapter.

String Benchmark Results

Because Insertion Sort and Selection Sort are the two slowest algorithms in this chapter on randomly uniform data (by several orders of magnitude) we omit these algorithms from Tables 4-7 through 4-11. However, it is worth repeating that on sorted data (Table 4-8) and nearly sorted data ( Tables 4-10 and 4-11) Insertion Sort will outperform the other algorithms, often by an order of magnitude. To produce the results shown in Tables 4-7 through 4-11, we executed each trial 100 times on the high-end computer and discarded the best and worst performers. The average of the remaining 98 trials is shown in these tables. The columns labeled Quicksort BFPRT4 minSize=4 refer to a Quicksort implementation that uses BFPRT (with groups of 4) to select the partition value and which switches to Insertion Sort when a subarray to be sorted has four or fewer elements.

Table 4-7. Performance results (in seconds) on random 26-letter permutations of the alphabet

n

Hash Sort17,576 buckets

Quicksort median-of-three

Heap Sort

Median Sort

Quicksort BFPRT4 minSize=4

4,096

0.0012

0.0011

0.0013

0.0023

0.0041

8,192

0.002

0.0024

0.0031

0.005

0.0096

16,384

0.0044

0.0056

0.0073

0.0112

0.022

32,768

0.0103

0.014

0.0218

0.0281

0.0556

65,536

0.0241

0.0342

0.0649

0.0708

0.1429

131,072

0.0534

0.0814

0.1748

0.1748

0.359

Table 4-8. Performance (in seconds) on ordered random 26-letter permutations of the alphabet

n

Hash Sort17,576 buckets

Quicksort median-of-three

Heap Sort

Median Sort

Quicksort BFPRT4 minSize=4

4,096

0.0011

0.0007

0.0012

0.002

0.0031

8,192

0.0019

0.0015

0.0027

0.0042

0.007

16,384

0.0037

0.0036

0.0062

0.0094

0.0161

32,768

0.0074

0.0082

0.0157

0.0216

0.0381

65,536

0.0161

0.0184

0.0369

0.049

0.0873

131,072

0.0348

0.0406

0.0809

0.1105

0.2001

Table 4-9. Performance (in seconds) on killer median data

n

Hash Sort 17,576 buckets

Heap Sort

Median Sort

Quicksort BFPRT4 minSize=4

Quicksort median-of-three[a]

[a]

4,096

0.0011

0.0012

0.0021

0.0039

0.0473

8,192

0.0019

0.0028

0.0045

0.0087

0.1993

16,384

0.0038

0.0066

0.0101

0.0194

0.8542

32,768

0.0077

0.0179

0.024

0.0472

4.083

65,536

0.0171

0.0439

0.056

0.1127

17.1604

131,072

0.038

0.1004

0.1292

0.2646

77.4519

[a] Because the performance of QUICKSORT median-of-three degrades so quickly, only 10 trials were executed; the table shows the average of eight runs once the best and worst performers were discarded.

Table 4-10. Performance (in seconds) on 16 random pairs of elements swapped eight locations away

n

Hash Sort17,576 buckets

Quicksort median-of-three

Heap Sort

Median Sort

Quicksort BFPRT4 minSize=4

4,096

0.0011

0.0007

0.0012

0.002

0.0031

8,192

0.0019

0.0015

0.0027

0.0042

0.007

16,384

0.0038

0.0035

0.0063

0.0094

0.0161

32,768

0.0072

0.0081

0.0155

0.0216

0.038

65,536

0.0151

0.0182

0.0364

0.0491

0.0871

131,072

0.0332

0.0402

0.08

0.1108

0.2015

Table 4-11. Performance (in seconds) on n/4 random pairs of elements swapped four locations away

n

Hash Sort17,576 buckets

Quicksort median-of-three

Heap Sort

Median Sort

Quicksort BFPRT4 minSize=4

4,096

0.0011

0.0008

0.0012

0.002

0.0035

8,192

0.0019

0.0019

0.0028

0.0044

0.0078

16,384

0.0039

0.0044

0.0064

0.0096

0.0175

32,768

0.0073

0.01

0.0162

0.0221

0.0417

65,536

0.0151

0.024

0.0374

0.0505

0.0979

131,072

0.0333

0.0618

0.0816

0.1126

0.2257

Double Benchmark Results

The benchmarks using double floating-point values ( Tables 4-12 through 4-16) eliminate much of the overhead that was simply associated with string comparisons. Once again, we omit Insertion Sort and Selection Sort from these tables.

Table 4-12. Performance (in seconds) on random floating-point values

n

Bucket Sort

Quicksort median-of-three

Median Sort

Heap Sort

Quicksort BFPRT4 minSize=4

4,096

0.0009

0.0009

0.0017

0.0012

0.0003

8,192

0.0017

0.002

0.0039

0.0029

0.0069

16,384

0.0041

0.0043

0.0084

0.0065

0.0157

32,768

0.0101

0.0106

0.0196

0.0173

0.039

65,536

0.0247

0.0268

0.0512

0.0527

0.1019

131,072

0.0543

0.0678

0.1354

0.1477

0.26623

Table 4-13. Performance (in seconds) on ordered floating-point values

n

Bucket Sort

Heap Sort

Median Sort

Quicksort median-of-three

Quicksort BFPRT4 minSize=4

4,096

0.0007

0.0011

0.0015

0.0012

0.0018

8,192

0.0015

0.0024

0.0032

0.0025

0.004

16,384

0.0035

0.0052

0.0067

0.0055

0.0089

32,768

0.0073

0.0127

0.015

0.0133

0.0208

65,536

0.0145

0.0299

0.0336

0.0306

0.0483

131,072

0.0291

0.065

0.0737

0.0823

0.1113

Table 4-14. Performance (in seconds) on killer median data

n

Bucket Sort

Heap Sort

Median Sort

Quicksort median-of-three

Quicksort BFPRT4 minSize=4

4,096

0.0008

0.0011

0.0015

0.0015

0.0025

8,192

0.0016

0.0024

0.0034

0.0033

0.0056

16,384

0.0035

0.0053

0.0071

0.0076

0.0122

32,768

0.0079

0.0134

0.0164

0.0192

0.0286

65,536

0.0157

0.0356

0.0376

0.0527

0.0686

131,072

0.0315

0.0816

0.0854

0.1281

0.1599

Table 4-15. Performance (in seconds) on 16 random pairs of elements swapped eight locations away

n

Bucket Sort

Heap Sort

Median Sort

Quicksort median-of-three

Quicksort BFPRT4 minSize=4

4,096

0.0007

0.0011

0.0015

0.0012

0.0018

8,192

0.0015

0.0024

0.0032

0.0025

0.004

16,384

0.0035

0.0051

0.0067

0.0054

0.0089

32,768

0.0071

0.0127

0.0151

0.0133

0.0209

65,536

0.0142

0.0299

0.0336

0.0306

0.0482

131,072

0.0284

0.065

0.0744

0.0825

0.111

Table 4-16. Performance (in seconds) on n/4 random pairs of elements swapped four locations away

n

Bucket Sort

Heap Sort

Quicksort median-of-three

Median Sort

Quicksort BFPRT4 minSize=4

4,096

0.0001

0.0014

0.0015

0.0019

0.005

8,192

0.0022

0.0035

0.0032

0.0052

0.012

16,384

0.0056

0.0083

0.0079

0.0099

0.0264

32,768

0.0118

0.0189

0.0189

0.0248

0.0593

65,536

0.0238

0.0476

0.045

0.0534

0.129

131,072

0.0464

0.1038

0.1065

0.1152

0.2754

Get Algorithms in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.