Selection Algorithm - Partition-based General Selection Algorithm

Partition-based General Selection Algorithm

A general selection algorithm that is efficient in practice, but has poor worst-case performance, was conceived by the inventor of quicksort, C.A.R. Hoare, and is known as Hoare's selection algorithm or quickselect.

In quicksort, there is a subprocedure called partition that can, in linear time, group a list (ranging from indices left to right) into two parts, those less than a certain element, and those greater than or equal to the element. Here is pseudocode that performs a partition about the element list:

function partition(list, left, right, pivotIndex) pivotValue := list swap list and list // Move pivot to end storeIndex := left for i from left to right-1 if list <= pivotValue swap list and list increment storeIndex swap list and list // Move pivot to its final place return storeIndex

In quicksort, we recursively sort both branches, leading to best-case Ω(n log n) time. However, when doing selection, we already know which partition our desired element lies in, since the pivot is in its final sorted position, with all those preceding it in sorted order and all those following it in sorted order. Thus a single recursive call locates the desired element in the correct partition:

function select(list, left, right, k) if left = right // If the list contains only one element return list // Return that element // select pivotIndex between left and right pivotNewIndex := partition(list, left, right, pivotIndex) pivotDist := pivotNewIndex - left + 1 // The pivot is in its final sorted position, // so pivotDist reflects its 1-based position if list were sorted if pivotDist = k return list else if k < pivotDist return select(list, left, pivotNewIndex - 1, k) else return select(list, pivotNewIndex + 1, right, k - pivotDist)

Note the resemblance to quicksort: just as the minimum-based selection algorithm is a partial selection sort, this is a partial quicksort, generating and partitioning only O(log n) of its O(n) partitions. This simple procedure has expected linear performance, and, like quicksort, has quite good performance in practice. It is also an in-place algorithm, requiring only constant memory overhead, since the tail recursion can be eliminated with a loop like this:

function select(list, left, right, k) loop // select pivotIndex between left and right pivotNewIndex := partition(list, left, right, pivotIndex) pivotDist := pivotNewIndex - left + 1 if pivotDist = k return list else if k < pivotDist right := pivotNewIndex - 1 else k := k - pivotDist left := pivotNewIndex + 1

Like quicksort, the performance of the algorithm is sensitive to the pivot that is chosen. If bad pivots are consistently chosen, this degrades to the minimum-based selection described previously, and so can require as much as O(n2) time. David Musser describes a "median-of-3 killer" sequence that can force the well-known median-of-three pivot selection algorithm to fail with worst-case behavior (see Introselect section below).

Despite its very easy pseudo code, unlike the quicksort, the quickselect algorithm is notoriously hard to get right, mainly because of the many different edge cases like repeating values. However, if done properly, a Java implementation is typically a magnitude (10x) faster than the quicksort algorithm.

Read more about this topic:  Selection Algorithm

Famous quotes containing the words general and/or selection:

    That sort of half sigh, which, accompanied by two or three slight nods of the head, is pity’s small change in general society.
    Charles Dickens (1812–1870)

    Every writer is necessarily a critic—that is, each sentence is a skeleton accompanied by enormous activity of rejection; and each selection is governed by general principles concerning truth, force, beauty, and so on.... The critic that is in every fabulist is like the iceberg—nine-tenths of him is under water.
    Thornton Wilder (1897–1975)