dstats.cor

Pearson, Spearman and Kendall correlations, covariance.

Members

Functions

covariance
double covariance(T input1, U input2)
covarianceMatrix
SymmetricMatrix!double covarianceMatrix(RoR mat, TaskPool pool = null)

These functions allow efficient calculation of the Pearson, Spearman and Kendall correlation matrices and the covariance matrix respectively. They are optimized to avoid computing certain values multiple times when computing correlations of one vector to several others.

covarianceMatrix
void covarianceMatrix(RoR mat, ref Ret ans, TaskPool pool = null)

These overloads allow for correlation and covariance matrices to be computed with the results being stored in a pre-allocated variable, ans. ans must be either a SciD matrix or a random-access range of ranges with assignable elements of a floating point type. It must have the same number of rows as the number of vectors in mat and must have at least enough columns in each row to support storing the lower triangle. If ans is a full rectangular matrix/range of ranges, only the lower triangle results will be stored.

kendallCor
double kendallCor(T input1, U input2)

Kendall's Tau-b, O(N log N) version. This is a non-parametric measure of monotonic association and can be defined in terms of the bubble sort distance, or the number of swaps that would be needed in a bubble sort to sort input2 into the same order as input1.

kendallCorDestructive
double kendallCorDestructive(R1 input1, R2 input2)

Kendall's Tau-b O(N log N), overwrites input arrays with undefined data but uses only O(log N) stack space for sorting, not O(N) space to duplicate input. R1 and R2 must be either SortedRange structs with the default predicate or arrays.

kendallMatrix
SymmetricMatrix!double kendallMatrix(RoR mat, TaskPool pool = null)

These functions allow efficient calculation of the Pearson, Spearman and Kendall correlation matrices and the covariance matrix respectively. They are optimized to avoid computing certain values multiple times when computing correlations of one vector to several others.

kendallMatrix
void kendallMatrix(RoR mat, ref Ret ans, TaskPool pool = null)

These overloads allow for correlation and covariance matrices to be computed with the results being stored in a pre-allocated variable, ans. ans must be either a SciD matrix or a random-access range of ranges with assignable elements of a floating point type. It must have the same number of rows as the number of vectors in mat and must have at least enough columns in each row to support storing the lower triangle. If ans is a full rectangular matrix/range of ranges, only the lower triangle results will be stored.

partial
double partial(T vec1, U vec2, V conditionsIn)

Computes the partial correlation between vec1, vec2 given conditions. conditions can be either a tuple of ranges, a range of ranges, or (for a single condition) a single range.

pearsonCor
PearsonCor pearsonCor(T input1, U input2)

Convenience function for calculating Pearson correlation. When the term correlation is used unqualified, it is usually referring to this quantity. This is a parametric correlation metric and should not be used with extremely ill-behaved data. This function works with any pair of input ranges.

pearsonMatrix
SymmetricMatrix!double pearsonMatrix(RoR mat, TaskPool pool = null)

These functions allow efficient calculation of the Pearson, Spearman and Kendall correlation matrices and the covariance matrix respectively. They are optimized to avoid computing certain values multiple times when computing correlations of one vector to several others.

pearsonMatrix
void pearsonMatrix(RoR mat, ref Ret ans, TaskPool pool = null)

These overloads allow for correlation and covariance matrices to be computed with the results being stored in a pre-allocated variable, ans. ans must be either a SciD matrix or a random-access range of ranges with assignable elements of a floating point type. It must have the same number of rows as the number of vectors in mat and must have at least enough columns in each row to support storing the lower triangle. If ans is a full rectangular matrix/range of ranges, only the lower triangle results will be stored.

spearmanCor
double spearmanCor(R input1, S input2)

Spearman's rank correlation. Non-parametric. This is essentially the Pearson correlation of the ranks of the data, with ties dealt with by averaging.

spearmanMatrix
SymmetricMatrix!double spearmanMatrix(RoR mat, TaskPool pool = null)

These functions allow efficient calculation of the Pearson, Spearman and Kendall correlation matrices and the covariance matrix respectively. They are optimized to avoid computing certain values multiple times when computing correlations of one vector to several others.

spearmanMatrix
void spearmanMatrix(RoR mat, ref Ret ans, TaskPool pool = null)

These overloads allow for correlation and covariance matrices to be computed with the results being stored in a pre-allocated variable, ans. ans must be either a SciD matrix or a random-access range of ranges with assignable elements of a floating point type. It must have the same number of rows as the number of vectors in mat and must have at least enough columns in each row to support storing the lower triangle. If ans is a full rectangular matrix/range of ranges, only the lower triangle results will be stored.

Structs

PearsonCor
struct PearsonCor

Allows computation of mean, stdev, variance, covariance, Pearson correlation online. Getters for stdev, var, cov, cor cost floating point division ops. Getters for means cost a single branch to check for N == 0. This struct uses O(1) space.

Meta

Authors

David Simcha