dstats ~master (2018-01-23T10:54:54Z)

The statistical tests performed in this function assume that an intercept term is included in your regression model. If no intercept term is included, the P-values, confidence intervals and adjusted R^2 values calculated by this function will be wrong.

1 int[] nBeers = [8,6,7,5,3,0,9]; 2 int[] nCoffees = [3,6,2,4,3,6,8]; 3 int[] musicVolume = [3,1,4,1,5,9,2]; 4 int[] programmingSkill = [2,7,1,8,2,8,1]; 5 6 // Using default confidence interval: 7 auto results = linearRegress(programmingSkill, repeat(1), nBeers, nCoffees, 8 musicVolume, map!"a * a"(musicVolume)); 9 10 // Using user-specified confidence interval: 11 auto results = linearRegress(programmingSkill, repeat(1), nBeers, nCoffees, 12 musicVolume, map!"a * a"(musicVolume), 0.8675309);

Perform a linear regression as in linearRegressBeta, but return a RegressRes with useful stuff for statistical inference. If the last element of input is a real, this is used to specify the confidence intervals to be calculated. Otherwise, the default of 0.95 is used. The rest of input should be the elements of X.

When using this function, which provides several useful statistics useful for inference, each range must be traversed twice. This means:

1. They have to be forward ranges, not input ranges.

2. If you have a large amount of data and you're mapping it to some expensive function, you may want to do this eagerly instead of lazily.

Notes:

The X ranges are traversed in lockstep, but the traversal is stopped at the end of the shortest one. Therefore, using infinite ranges is safe. For example, using repeat(1) to get an intercept term works.

If the confidence interval specified is exactly 0, this is treated as a special case and confidence interval calculation is skipped. This can speed things up significantly and therefore can be useful in monte carlo and possibly data mining contexts.