Adaptor for some models from the StatsModels package
This adaptor allows for fitting several statistical models to univariate (in StatsModels terminology “endogeneous”) data. A model, based on “exogeneous” data (i.e. a design matrix) and optional parameters, is fitted to each feature vector in a given dataset individually. The adaptor supports a variety of models provided by the StatsModels package, including simple ordinary least squares (OLS), generalized least squares (GLS) and others. This feature-wise measure can extract a variety of properties from the model fit results, and aggregate them into a result dataset. This includes, for example, all attributes of a StatsModels RegressionResult class, such as model parameters and their error estimates, Aikake’s information criteria, and a number of statistical properties. Moreover, it is possible to perform t-contrasts/t-tests of parameter estimates, as well as F-tests for contrast matrices.
Notes
Available conditional attributes:
(Conditional attributes enabled by default suffixed with +)
Examples
Some example data: two features, seven samples
>>> endog = Dataset(np.transpose([[1, 2, 3, 4, 5, 6, 8],
... [1, 2, 1, 2, 1, 2, 1]]))
>>> exog = range(7)
Set up a model generator – it yields an instance of an OLS model for a particular design and feature vector. The generator will be called internally for each feature in the dataset.
>>> model_gen = lambda y, x: sm.OLS(y, x)
Configure the adaptor with the model generator and a common design for all feature model fits. Tell the adaptor to auto-add a constant to the design.
>>> usm = UnivariateStatsModels(exog, model_gen, add_constant=True)
Run the measure. By default it extracts the parameter estimates from the models (two per feature/model: regressor + constant).
>>> res = usm(endog)
>>> print res
<Dataset: 2x2@float64, <sa: descr>>
>>> print res.sa.descr
['params' 'params']
Alternatively, extract t-values for a test of all parameter estimates against zero.
>>> usm = UnivariateStatsModels(exog, model_gen, res='tvalues',
... add_constant=True)
>>> res = usm(endog)
>>> print res
<Dataset: 2x2@float64, <sa: descr>>
>>> print res.sa.descr
['tvalues' 'tvalues']
Compute a t-contrast: first parameter is non-zero. This returns additional test statistics, such as p-value and effect size in the result dataset. The contrast vector is pass on to the t_test() function (r_matrix argument) of the StatsModels result class.
>>> usm = UnivariateStatsModels(exog, model_gen, res=[1,0],
... add_constant=True)
>>> res = usm(endog)
>>> print res
<Dataset: 6x2@float64, <sa: descr>>
>>> print res.sa.descr
['tvalue' 'pvalue' 'effect' 'sd' 'df' 'zvalue']
F-test for a contrast matrix, again with additional test statistics in the result dataset. The contrast vector is pass on to the f_test() function (r_matrix argument) of the StatsModels result class.
>>> usm = UnivariateStatsModels(exog, model_gen, res=[[1,0],[0,1]],
... add_constant=True)
>>> res = usm(endog)
>>> print res
<Dataset: 4x2@float64, <sa: descr>>
>>> print res.sa.descr
['fvalue' 'pvalue' 'df_num' 'df_denom']
For any custom result extraction, a callable can be passed to the res argument. This object will be called with the result of each model fit. Its return value(s) will be aggregated into a result dataset.
>>> def extractor(res):
... return [res.aic, res.bic]
>>>
>>> usm = UnivariateStatsModels(exog, model_gen, res=extractor,
... add_constant=True)
>>> res = usm(endog)
>>> print res
<Dataset: 2x2@float64>
Methods
generate(ds) | Yield processing results. |
get_postproc() | Returns the post-processing node or None. |
get_space() | Query the processing space name of this node. |
reset() | |
set_postproc(node) | Assigns a post-processing node |
set_space(name) | Set the processing space name of this node. |
train(ds) | The default implementation calls _pretrain(), _train(), and finally _posttrain(). |
untrain() | Reverts changes in the state of this node caused by previous training |
Parameters: | exog : array-like
model_gen : callable
res : {‘params’, ‘tvalues’, ...} or 1d array or 2d array or callable
add_constant : bool, optional
enable_ca : None or list of str
disable_ca : None or list of str
null_dist : instance of distribution estimator
auto_train : bool
force_train : bool
space : str, optional
pass_attr : str, list of str|tuple, optional
postproc : Node instance, optional
descr : str
|
---|
Methods
generate(ds) | Yield processing results. |
get_postproc() | Returns the post-processing node or None. |
get_space() | Query the processing space name of this node. |
reset() | |
set_postproc(node) | Assigns a post-processing node |
set_space(name) | Set the processing space name of this node. |
train(ds) | The default implementation calls _pretrain(), _train(), and finally _posttrain(). |
untrain() | Reverts changes in the state of this node caused by previous training |