Class MCNullDist
source code
Null-hypothesis distribution is estimated from randomly permuted data labels.
The distribution is estimated by calling fit() with an appropriate
DatasetMeasure or TransferError instance and a training and a
validation dataset (in case of a TransferError). For a customizable
amount of cycles the training data labels are permuted and the
corresponding measure computed. In case of a TransferError this is the
error when predicting the correct labels of the validation dataset.
The distribution can be queried using the cdf() method, which can be
configured to report probabilities/frequencies from left
or right
tail,
i.e. fraction of the distribution that is lower or larger than some
critical value.
This class also supports FeaturewiseDatasetMeasure. In that case cdf()
returns an array of featurewise probabilities/frequencies.
|
__init__(self,
dist_class=Nonparametric,
permutations=100,
**kwargs)
Initialize Monte-Carlo Permutation Null-hypothesis testing |
source code
|
|
|
|
|
fit(self,
measure,
wdata,
vdata=None)
Fit the distribution by performing multiple cycles which repeatedly
permuted labels in the training dataset. |
source code
|
|
|
cdf(self,
x)
Return value of the cumulative distribution function at x . |
source code
|
|
|
|
Inherited from NullDist :
p
Inherited from misc.state.ClassWithCollections :
__getattribute__ ,
__new__ ,
__setattr__ ,
__str__ ,
reset
Inherited from object :
__delattr__ ,
__hash__ ,
__reduce__ ,
__reduce_ex__
|
|
__permutations
Number of permutations to compute the estimate the null
distribution.
|
Inherited from object :
__class__
|
__init__(self,
dist_class=Nonparametric,
permutations=100,
**kwargs)
(Constructor)
| source code
|
Initialize Monte-Carlo Permutation Null-hypothesis testing
- Parameters:
dist_class , class - This can be any class which provides parameters estimate
using fit() method to initialize the instance, and
provides cdf(x) method for estimating value of x in CDF.
All distributions from SciPy's 'stats' module can be used.
permutations , int - This many permutations of label will be performed to
determine the distribution under the null hypothesis.
- Overrides:
NullDist.__init__
|
Fit the distribution by performing multiple cycles which repeatedly
permuted labels in the training dataset.
- Overrides:
NullDist.fit
Parameters:
- measure: (
Featurewise )`DatasetMeasure` | TransferError
- TransferError instance used to compute all errors.
- wdata: Dataset which gets permuted and used to compute the
- measure/transfer error multiple times.
- vdata: Dataset used for validation.
- If provided measure is assumed to be a TransferError and
working and validation dataset are passed onto it.
|
Return value of the cumulative distribution function at x .
- Overrides:
NullDist.cdf
|
Clean stored distributions
Storing all of the distributions might be too expensive
(e.g. in case of Nonparametric), and the scope of the object
might be too broad to wait for it to be destroyed. Clean would
bind dist_samples to empty list to let gc revoke the memory.
|
_DEV_DOC
- Value:
"""
TODO automagically decide on the number of samples/permutations ne
eded
Caution should be paid though since resultant distributions might
be
quite far from some conventional ones (e.g. Normal) -- it is expec
ted to
them to be bimodal (or actually multimodal) in many scenarios.
...
|
|
dist_samples
- Value:
StateVariable(enabled= False, doc= 'Samples obtained for each permutat
ion')
|
|