Exclude (do not generate) provided dataset on the values of the attributes.
Notes
Available conditional attributes:
(Conditional attributes enabled by default suffixed with +)
Examples
Typical usecase: it is necessary to generate all possible combinations of two chunks while being interested only in the combinations where both targets are present.
>>> from mvpa2.datasets import Dataset
>>> from mvpa2.generators.partition import NFoldPartitioner
>>> from mvpa2.base.node import ChainNode
>>> ds = Dataset(samples=np.arange(8).reshape((4,2)),
... sa={'chunks': [ 0 , 1 , 2 , 3 ],
... 'targets': ['c', 'c', 'p', 'p']})
Plain ‘NFoldPartitioner(cvtype=2)’ would provide also partitions with only two ‘c’s or ‘p’s present, which we do not want to include in our cross-validation since it would break balancing between training and testing sets.
>>> par = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
... Sifter([('partitions', 2),
... ('targets', ['c', 'p'])])
... ], space='partitions')
We have to provide appropriate ‘space’ parameter for the ‘ChainNode’ so possible future splitting using ‘TransferMeasure’ could operate along that attribute. Here we just matched default space of NFoldPartitioner – ‘partitions’.
>>> print par
<ChainNode: <NFoldPartitioner>-<Sifter: partitions=2, targets=['c', 'p']>>
Additionally, e.g. for cases with cvtype > 2, if balancing is needed to be guaranteed (and other generated partitions discarded), specification could carry a dict with ‘uvalues’ and ‘balanced’ keys, e.g.:
>>> par = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
... Sifter([('partitions', 2),
... ('targets', dict(uvalues=['c', 'p'],
... balanced=True))])
... ], space='partitions')
N.B. In this example it is equivalent to the previous definition since things are guaranteed to be balanced with cvtype=2 and 2 unique values requested.
>>> for ds_ in par.generate(ds):
... testing = ds[ds_.sa.partitions == 2]
... print list(zip(testing.sa.chunks, testing.sa.targets))
[(0, 'c'), (2, 'p')]
[(0, 'c'), (3, 'p')]
[(1, 'c'), (2, 'p')]
[(1, 'c'), (3, 'p')]
Methods
generate(ds) | Validate obtained dataset and yield if matches |
get_postproc() | Returns the post-processing node or None. |
get_space() | Query the processing space name of this node. |
reset() | |
set_postproc(node) | Assigns a post-processing node |
set_space(name) | Set the processing space name of this node. |
Parameters: | includes : list
enable_ca : None or list of str
disable_ca : None or list of str
space : str, optional
pass_attr : str, list of str|tuple, optional
postproc : Node instance, optional
descr : str
|
---|
Methods
generate(ds) | Validate obtained dataset and yield if matches |
get_postproc() | Returns the post-processing node or None. |
get_space() | Query the processing space name of this node. |
reset() | |
set_postproc(node) | Assigns a post-processing node |
set_space(name) | Set the processing space name of this node. |
Validate obtained dataset and yield if matches