Median of Medians Algorithm
Class | Selection algorithm |
---|---|
Data structure | Array |
Worst case performance | |
Best case performance | |
Worst case space complexity | auxiliary |
A worst-case linear algorithm for the general case of selecting the kth largest element was published by Blum, Floyd, Pratt, Rivest and Tarjan in their 1973 paper "Time bounds for selection", sometimes called BFPRT after the last names of the authors. It is based on the quickselect algorithm and is also known as the median-of-medians algorithm.
Although quickselect is linear-time on average, it can require quadratic time with poor pivot choices (consider the case of pivoting around the smallest element at each step). The solution to make it O(n) in the worst case is to consistently find "good" pivots. A good pivot is one for which we can establish that a constant proportion of elements fall both below and above it.
The Select algorithm divides the list into groups of five elements. (Left over elements are ignored for now.) Then, for each group of five, the median is calculated (an operation that can potentially be made very fast if the five values can be loaded into registers and compared). (If sorting in-place, then these medians are moved into one contiguous block in the list.) Select is then called recursively on this sublist of n/5 elements to find their true median. Finally, the "median of medians" is chosen to be the pivot.
//returns the index of the median of medians. //requires a variant of select, "selectIdx" which returns the index of the selected item rather than the value function medianOfMedians(list, left, right) numMedians = (right - left) / 5 for i from 0 to numMedians //get the median of the five-element subgroup subLeft = left + i*5 subRight = subLeft + 5 if (subRight > right) subRight = right medianIdx = selectIdx(list, subLeft, subRight, 2) //alternatively, use a faster method that works on lists of size 5 //move the median to a contiguous block at the beginning of the list swap list and list //select the median from the contiguous block return selectIdx(list, left, left + numMedians, numMedians / 2)Read more about this topic: Selection Algorithm, Linear General Selection Algorithm