dstats.summary

Summary statistics such as mean, median, sum, variance, skewness, kurtosis. Except for median and median absolute deviation, which cannot be calculated online, all summary statistics have both an input range interface and an output range interface.

Notes: The put method on the structs defined in this module returns this by ref. The use case for returning this is to enable these structs to be used with std.algorithm.reduce. The rationale for returning by ref is that the return value usually won't be used, and the overhead of returning a large struct by value should be avoided.

Members

Functions

geometricMean
double geometricMean(T data)

Calculates the geometric mean of any input range that has elements implicitly convertible to double

interquantileRange
double interquantileRange(R data, double quantile = 0.25)

Computes the interquantile range of data at the given quantile value in O(N) time complexity. For example, using a quantile value of either 0.25 or 0.75 will give the interquartile range. (This is the default since it is apparently the most common interquantile range in common usage.) Using a quantile value of 0.2 or 0.8 will give the interquntile range.

kurtosis
double kurtosis(T data)

Excess kurtosis relative to normal distribution. High kurtosis means that the variance is due to infrequent, large deviations from the mean. Low kurtosis means that the variance is due to frequent, small deviations from the mean. The normal distribution is defined as having kurtosis of 0. Input must be an input range with elements implicitly convertible to double.

mean
Mean mean(T data)

Finds the arithmetic mean of any input range whose elements are implicitly convertible to double.

meanStdev
MeanSD meanStdev(T data)

Puts all elements of data into a MeanSD struct, then returns this struct. This can be faster than doing this manually due to ILP optimizations.

median
double median(T data)

Finds median of an input range in O(N) time on average. In the case of an even number of elements, the mean of the two middle elements is returned. This is a convenience founction designed specifically for numeric types, where the averaging of the two middle elements is desired. A more general selection algorithm that can handle any type with a total ordering, as well as selecting any position in the ordering, can be found at dstats.sort.quickSelect() and dstats.sort.partitionK(). Allocates memory, does not reorder input data.

medianAbsDev
MedianAbsDev medianAbsDev(T data)

Calculates the median absolute deviation of a dataset. This is the median of all absolute differences from the median of the dataset.

medianPartition
double medianPartition(T data)

Median finding as in median(), but will partition input data such that elements less than the median will have smaller indices than that of the median, and elements larger than the median will have larger indices than that of the median. Useful both for its partititioning and to avoid memory allocations. Requires a random access range with swappable elements.

skewness
double skewness(T data)

Skewness is a measure of symmetry of a distribution. Positive skewness means that the right tail is longer/fatter than the left tail. Negative skewness means the left tail is longer/fatter than the right tail. Zero skewness indicates a symmetrical distribution. Input must be an input range with elements implicitly convertible to double.

stdev
double stdev(T data)

Calculate the standard deviation of an input range with members implicitly converitble to double.

sum
U sum(T data)

Finds the sum of an input range whose elements implicitly convert to double. User has option of making U a different type than T to prevent overflows on large array summing operations. However, by default, return type is T (same as input type).

summary
Summary summary(T data)

Convenience function. Puts all elements of data into a Summary struct, and returns this struct.

variance
double variance(T data)

Finds the variance of an input range with members implicitly convertible to doubles.

zScore
ZScore!(T) zScore(T range)

Returns a range with whatever properties T has (forward range, random access range, bidirectional range, hasLength, etc.), of the z-scores of the underlying range. A z-score of an element in a range is defined as (element - mean(range)) / stdev(range).

zScore
ZScore!(T) zScore(T range, double mean, double sd)

Allows the construction of a ZScore range with precomputed mean and stdev.

Structs

GeometricMean
struct GeometricMean

Output range to calculate the geometric mean online. Operates similarly to dstats.summary.Mean

Mean
struct Mean

Output range to calculate the mean online. Getter for mean costs a branch to check for N == 0. This struct uses O(1) space and does *NOT* store the individual elements.

MeanSD
struct MeanSD

Output range to compute mean, stdev, variance online. Getter methods for stdev, var cost a few floating point ops. Getter for mean costs a single branch to check for N == 0. Relatively expensive floating point ops, if you only need mean, try Mean. This struct uses O(1) space and does *NOT* store the individual elements.

MedianAbsDev
struct MedianAbsDev

Plain old data holder struct for median, median absolute deviation. Alias this'd to the median absolute deviation member.

Summary
struct Summary

Output range to compute mean, stdev, variance, skewness, kurtosis, min, and max online. Using this struct is relatively expensive, so if you just need mean and/or stdev, try MeanSD or Mean. Getter methods for stdev, var cost a few floating point ops. Getter for mean costs a single branch to check for N == 0. Getters for skewness and kurtosis cost a whole bunch of floating point ops. This struct uses O(1) space and does *NOT* store the individual elements.

ZScore
struct ZScore(T)

Bugs

This whole module assumes that input will be doubles or types implicitly convertible to double. No allowances are made for user-defined numeric types such as BigInts. This is necessary for simplicity. However, if you have a function that converts your data to doubles, most of these functions work with any input range, so you can simply map this function onto your range.

Meta

Authors

David Simcha