dstats.sort

A comprehensive sorting library for statistical functions. Each function takes N arguments, which are arrays or array-like objects, sorts the first and sorts the rest in lockstep. For merge and insertion sort, if the last argument is a ulong*, increments the dereference of this ulong* by the bubble sort distance between the first argument and the sorted version of the first argument. This is useful for some statistical calculations.

All sorting functions have the precondition that all parallel input arrays must have the same length.

Notes:

Comparison functions must be written such that compFun(x, x) == false. For example, "a < b" is good, "a <= b" is not.

These functions are heavily optimized for sorting arrays of ints and floats (by far the most common case when doing statistical calculations). In these cases, they can be several times faster than the equivalent functions in std.algorithm. Since sorting is extremely important for non-parametric statistics, this results in important real-world performance gains. However, it comes at a price in terms of generality:

1. They assume that what they are sorting is cheap to copy via normal assignment.

2. They don't work at all with general ranges, only arrays and maybe ranges very similar to arrays.

3. All tuning and micro-optimization is done with ints and floats, not classes, large structs, strings, etc.

Members

Classes

SortException
class SortException
Undocumented in source.

Enums

DATA
anonymousenum DATA
Undocumented in source.

Functions

bubbleSort
T[0] bubbleSort(T data)
Undocumented in source. Be warned that the author may not have intended to support it.
heapSort
T[0] heapSort(T data)

Heap sort. Unstable, O(N log N) time average and worst case, O(1) space, * large constant term in time complexity.

heapSortImpl
T[0] heapSortImpl(T input)
Undocumented in source. Be warned that the author may not have intended to support it.
insertionSort
T[0] insertionSort(T data)

Insertion sort. O(N<sup>2</sup>) time worst, average case, O(1) space, VERY * small constant, which is why it's useful for sorting small subarrays in * divide and conquer algorithms. If last argument is a ulong*, increments * the dereference of this argument by the bubble sort distance between the * input array and the sorted version of the input.

insertionSortImpl
T[0] insertionSortImpl(T data)
Undocumented in source. Be warned that the author may not have intended to support it.
intIsNaN
bool intIsNaN(I i)
Undocumented in source. Be warned that the author may not have intended to support it.
makeMultiHeap
void makeMultiHeap(T input)
Undocumented in source. Be warned that the author may not have intended to support it.
medianOf3
size_t medianOf3(T[] data)
Undocumented in source. Be warned that the author may not have intended to support it.
merge
void merge(T data)
Undocumented in source. Be warned that the author may not have intended to support it.
mergeInPlace
void mergeInPlace(T data, size_t middle)
Undocumented in source. Be warned that the author may not have intended to support it.
mergeSort
T[0] mergeSort(T data)

Merge sort. O(N log N) time, O(N) space, small constant. Stable sort. * If last argument is a ulong* instead of an array-like type, * the dereference of the ulong* will be incremented by the bubble sort * distance between the input array and the sorted version. This is useful * in some statistics functions such as Kendall's tau.

mergeSortImpl
uint mergeSortImpl(T dataIn)
Undocumented in source. Be warned that the author may not have intended to support it.
mergeSortInPlace
T[0] mergeSortInPlace(T data)

In-place merge sort, based on C++ STL's stable_sort(). O(N log<sup>2</sup> N) * time complexity, O(1) space complexity, stable. Much slower than plain * old mergeSort(), so only use it if you really need the O(1) space.

mergeSortInPlaceImpl
T[0] mergeSortInPlaceImpl(T data)
Undocumented in source. Be warned that the author may not have intended to support it.
mergeSortTemp
T[0] mergeSortTemp(T data)

Merge sort, allowing caller to provide a temp variable. This allows recycling instead of repeated allocations. If D is data, T is temp, and U is a ulong* for calculating bubble sort distance, this can be called as mergeSortTemp(D, D, D, T, T, T, U) or mergeSortTemp(D, D, D, T, T, T) where each D has a T of corresponding type.

multiSiftDown
void multiSiftDown(T input, size_t root, size_t end)
Undocumented in source. Be warned that the author may not have intended to support it.
partitionK
ElementType!(T[0]) partitionK(T data, ptrdiff_t k)

Partitions the input data according to compFun, such that position k contains the kth largest/smallest element according to compFun. For all elements e with indices < k, !compFun(datak, e) is guaranteed to be true. For all elements e with indices > k, !compFun(e, datak) is guaranteed to be true. For example, if compFun is "a < b", all elements with indices < k will be <= datak, and all elements with indices larger than k will be >= k. Reorders any additional input arrays in lockstep.

partitionKImpl
ElementType!(T[0]) partitionKImpl(T data, ptrdiff_t k)
Undocumented in source. Be warned that the author may not have intended to support it.
postProcess
void postProcess(T arr)
Undocumented in source. Be warned that the author may not have intended to support it.
postProcess
void postProcess(F arr)
Undocumented in source. Be warned that the author may not have intended to support it.
prepareForSorting
auto prepareForSorting(T arr)
Undocumented in source. Be warned that the author may not have intended to support it.
qsort
T[0] qsort(T data)

Quick sort. Unstable, O(N log N) time average, worst * case, O(log N) space, small constant term in time complexity. * * In this implementation, the following steps are taken to avoid the * O(N<sup>2</sup>) worst case of naive quick sorts: * * 1. At each recursion, the median of the first, middle and last elements of * the array is used as the pivot. * * 2. To handle the case of few unique elements, the "Fit Pivot" technique * previously decribed by Andrei Alexandrescu is used. This allows * reasonable performance with few unique elements, with zero overhead * in other cases. * * 3. After a much larger than expected amount of recursion has occured, * this function transitions to a heap sort. This guarantees an O(N log N) * worst case.

qsortImpl
void qsortImpl(T data, uint TTL)
Undocumented in source. Be warned that the author may not have intended to support it.
quickSelect
T quickSelect(T[] data, sizediff_t k)

Returns the kth largest/smallest element (depending on compFun, 0-indexed) * in the input array in O(N) time. Allocates memory, does not modify input * array.

removeWhitespace
string removeWhitespace(string input)
Undocumented in source. Be warned that the author may not have intended to support it.
rotateLeft
void rotateLeft(T input)
Undocumented in source. Be warned that the author may not have intended to support it.
rotateRight
void rotateRight(T input)
Undocumented in source. Be warned that the author may not have intended to support it.

Static functions

testFloating
void testFloating()
Undocumented in source. Be warned that the author may not have intended to support it.

Structs

TopN
struct TopN(T, alias compFun = "a > b")

Given a set of data points entered through the put function, this output range maintains the invariant that the top N according to compFun will be contained in the data structure. Uses a heap internally, O(log N) insertion time. Good for finding the largest/smallest N elements of a very large dataset that cannot be sorted quickly in its entirety, and may not even fit in memory. If less than N datapoints have been entered, all are contained in the structure.

Templates

ArrayElemType
template ArrayElemType(T : T[])
Undocumented in source.
isSimpleComparison
template isSimpleComparison(alias comp)
Undocumented in source.

Examples

auto foo = [3, 1, 2, 4, 5].dup;
auto bar = [8, 6, 7, 5, 3].dup;
qsort(foo, bar);
assert(foo == [1, 2, 3, 4, 5]);
assert(bar == [6, 7, 8, 5, 3]);
auto baz = [1.0, 0, -1, -2, -3].dup;
mergeSort!("a > b")(bar, foo, baz);
assert(bar == [8, 7, 6, 5, 3]);
assert(foo == [3, 2, 1, 4, 5]);
assert(baz == [-1.0, 0, 1, -2, -3]);

Meta

Authors

David Simcha