dstats ~master (2018-01-23T10:54:54Z)

Type: double[]

The beta coefficients for the regression model.

References:

http://en.wikipedia.org/wiki/Logistic_regression

http://socserv.mcmaster.ca/jfox/Courses/UCLA/logistic-regression-notes.pdf

S. Le Cessie and J. C. Van Houwelingen. Ridge Estimators in Logistic Regression. Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 41, No. 1(1992), pp. 191-201

Frank E Harrell Jr (2009). Design: Design Package. R package version 2.3-0. http://CRAN.R-project.org/package=Design

Computes a logistic regression using a maximum likelihood estimator and returns the beta coefficients. This is a generalized linear model with the link function f(XB) = 1 / (1 + exp(XB)). This is generally used to model the probability that a binary Y variable is 1 given a set of X variables.

For the purpose of this function, Y variables are interpreted as Booleans, regardless of their type. X may be either a range of ranges or a tuple of ranges. However, note that unlike in linearRegress, they are copied to an array if they are not random access ranges. Note that each value is accessed several times, so if your range is a map to something expensive, you may want to evaluate it eagerly.

If the last parameter passed in is a numeric value instead of a range, it is interpreted as a ridge parameter and ridge regression is performed. This penalizes the L2 norm of the beta vector (in a scaled space) and results in more parsimonious models. It limits the usefulness of inference techniques (p-values, confidence intervals), however, and is therefore not offered in logisticRegres().

If no ridge parameter is passed, or equivalenty if the ridge parameter is zero, then ordinary maximum likelihood regression is performed.

Note that, while this implementation of ridge regression was tested against the R Design Package implementation, it uses slightly different conventions that make the results not comparable without transformation. dstats uses a biased estimate of the variance to scale the beta vector penalties, while Design uses an unbiased estimate. Furthermore, Design penalizes by 1/2 of the L2 norm, whereas dstats penalizes by the L2 norm. Therefore, if n is the sample size, and lambda is the penalty used with dstats, the proper penalty to use in Design to get the same results is 2 * (n - 1) * lambda / n.

Also note that, as in linearRegress, repeat(1) can be used for the intercept term.