RFE with the nested cross-validation to estimate optimal number of features.
Given a learner (classifier) with a sensitivity analyzer and a partitioner, during training SplitRFE first performs a cross-validation with RFE to later estimate optimal number of features which should survive in RFE. Optimal number is chosen as the mid-point among all minimums of the average errors across splits. After deducing optimal number of features, SplitRFE applies regular RFE again on the full training dataset stopping at the estimated optimal number of features.
Notes
Available conditional attributes:
(Conditional attributes enabled by default suffixed with +)
Examples
Resting on an example giving for the RFE here is an implementation using SplitRFE helper:
>>> # Lazy import
>>> from mvpa2.suite import *
>>> # design an RFE feature selection to be used with a classifier
>>> rfe = SplitRFE(
... LinearCSVMC(),
... OddEvenPartitioner(),
... # take sensitivities per each split, L2 norm, abs, mean them
... fmeasure_postproc=ChainMapper([
... FxMapper('features', l2_normed),
... FxMapper('samples', np.abs),
... FxMapper('samples', np.mean)]),
... # select 50% of the best on each step
... fselector=FractionTailSelector(
... 0.50,
... mode='select', tail='upper'),
... # but we do want to update sensitivities on each step
... update_sensitivity=True)
>>> clf = FeatureSelectionClassifier(
... LinearCSVMC(),
... # on features selected via RFE
... rfe,
... # custom description
... descr='LinSVM+RFE(splits_avg)' )
But not only classifiers and their sensitivites could be used for RFE. It could be used even with univariate measures (e.g. OnewayAnova).
Methods
forward(data) | Map data from input to output space. |
forward1(data) | Wrapper method to map single samples. |
generate(ds) | Yield processing results. |
get_postproc() | Returns the post-processing node or None. |
get_space() | Query the processing space name of this node. |
reset() | |
reverse(data) | Reverse-map data from output back into input space. |
reverse1(data) | |
set_postproc(node) | Assigns a post-processing node |
set_space(name) | Set the processing space name of this node. |
train(ds) | The default implementation calls _pretrain(), _train(), and finally _posttrain(). |
untrain() | Reverts changes in the state of this node caused by previous training |
Parameters : | lrn : Learner
partitioner : Partitioner
fselector : Functor
errorfx : func, optional
fmeasure_postproc : func, optional
enable_ca : None or list of str
disable_ca : None or list of str
update_sensitivity : bool
filler : optional
auto_train : bool
force_train : bool
space : str, optional
pass_attr : str, list of str|tuple, optional
postproc : Node instance, optional
descr : str
|
---|
Methods
forward(data) | Map data from input to output space. |
forward1(data) | Wrapper method to map single samples. |
generate(ds) | Yield processing results. |
get_postproc() | Returns the post-processing node or None. |
get_space() | Query the processing space name of this node. |
reset() | |
reverse(data) | Reverse-map data from output back into input space. |
reverse1(data) | |
set_postproc(node) | Assigns a post-processing node |
set_space(name) | Set the processing space name of this node. |
train(ds) | The default implementation calls _pretrain(), _train(), and finally _posttrain(). |
untrain() | Reverts changes in the state of this node caused by previous training |