Sci.Stat: Statistical Functions

This module contains statistical functions and on-line statistical accumulators. All the algorithms are implemented with a focus on the numerical stability of the computations. It is thus preferable to use these estimation functions instead of re-creating the functionality via sci.alg operations.

API - Generic

mu = stat.mean(x)

Returns the sample mean of the vector x which is based on the mathematical estimator: $$ \mu_x = \frac{1}{N}\sum_{i = 1}^{N} x_i $$

var = stat.var(x)

Returns the 'unbiased' sample variance of the vector x which is based on the mathematical estimator: $$ \sigma^2_x = \frac{1}{N-1}\sum_{i = 1}^{N} \left(x_i - \mu_x\right)^2 $$ The vector x must have minimum length of 2.

cov = stat.cov(x, y)

Returns the sample covariance between the vector x and the vector y which is based on the mathematical estimator: $$ \text{cov}(x,y) = \frac{1}{N-1}\sum_{i = 1}^{N} \left(x_i - \mu_x\right)\left(y_i - \mu_y\right) $$ Both vectors must share the same length bigger or equal to 2.

cor = stat.cor(x, y)

Returns the sample correlation between the vector x and the vector y which is based on the mathematical estimator: $$ \rho(x,y) = \frac{\text{cov}(x,y)}{\sigma_x\sigma_y} $$ Both vectors must share the same length bigger or equal to 2.

API - Online Accumulators

Follows a list of on-line accumulators that offer the same statistical estimators just introduced. We speak of on-line accumulators as it's possible to update the running estimates to take into account new observations using the ol:push() function with constant complexity. Each on-line accumulator has an associated dimension which is set at creation time and cannot be changed afterwards.

In the following we refer to ol for a generic on-line accumulator. All ol objects support the following 4 methods:

ol:push()

Take x into account. If ol has been initialized with a dimension of 0 then x must be a Lua number. Otherwise x must be a vector of length equal to the dimension of ol.

ol:clear()

Reset ol to its initial state (no observations taken into account yet).

dim = ol:dim()

Returns the dimension of ol.

len = ol:len()

Returns the number of observations taken into account in ol.

ol = stat.olmean(dim)

Returns an on-line accumulator of dimension dim which supports the following method:

mu = ol:mean(); ol:mean(y)

If ol has been initialized with a dimension of 0 then the first variant is used and the running sample mean is returned. Otherwise y must be a vector of length equal to the dimension of ol to which the running sample mean is set.

ol = stat.olvar(dim)

Returns an on-line accumulator of dimension dim which supports ol:mean() and the following method:

sigma2 = ol:var(); ol:var(y)

If ol has been initialized with a dimension of 0 then the first variant is used and the running sample variance is returned. Otherwise y must be a vector of length equal to the dimension of ol to which the running sample variance is set.

ol = stat.olcov(dim)

Returns an on-line accumulator of dimension dim which supports ol:mean(), ol:var() and the following two methods:

ol:cov(C)

C must be a square matrix with dimensions equal to the dimension of ol to which the running sample covariance is set.

ol:cor(R)

R must be a square matrix with dimensions equal to the dimension of ol to which the running sample correlation is set.