Man page for crossval
NAME
pymvpa2-crossval - cross-validation of a learner's performance
SYNOPSIS
pymvpa2
crossval [
--version] [
-h]
-i DATASET [
DATASET ...]
--learner LEARNER [
--learner-space LEARNER_SPACE]
--partitioner PARTITIONER [
--errorfx ERRORFX] [
--avg-datafold-results] [
--balance-training BALANCE_TRAINING] [
--sampling-repetitions SAMPLING_REPETITIONS] [
--permutations PERMUTATIONS] [
--prob-tail {left,right}]
-o OUTPUT [
--hdf5-compression TYPE]
DESCRIPTION
Cross-validation of a learner's performance
A learner is repeatedly trained and tested on partitions of an input
dataset that are generated by a configurable partitioning scheme.
Partition usually constitute training and testing portions. The learner
is trained on training portion of the dataset and then learner's
generalization is tested by comparing its predictions on the testing portion.
A summary of a learner performance is written to STDOUT. Depending on the
particular setup of the cross-validation analysis, either the learner's raw
predictions or summary statistics are returned in an output dataset.
If Monte-Carlo permutation testing is enabled (see --permutations) a second
output dataset with the corresponding p-values is stored as well (filename
suffix '_nullprob').
OPTIONS
- --version
-
show program's version and license information and
exit
- -h, --help, --help-np
-
show this help message and exit. --help-np forcefully
disables the use of a pager for displaying the help.
- -i DATASET [DATASET ...], --input DATASET [DATASET ...]
-
path(s) to one or more PyMVPA dataset files. All
datasets will be merged into a single dataset
(vstack'ed) in order of specification. In some cases
this option may need to be specified more than once if
multiple, but separate, input datasets are required.
Options for cross-validation setup:
- --learner LEARNER
-
select a learner (trainable node) via its description
in the learner warehouse (see 'info' command for a
listing), a colon-separated list of capabilities, or
by a file path to a Python script that creates a
classifier instance (advanced).
- --learner-space LEARNER_SPACE
-
name of a sample attribute that defines the model to
be learned by a learner. By default this is an
attribute named 'targets'.
- --partitioner PARTITIONER
-
select a data folding scheme. Supported arguments are:
'half' for split-half partitioning, 'oddeven' for
partitioning into odd and even chunks, 'group-X' where
X can be any positive integer for partitioning in X
groups, 'n-X' where X can be any positive integer for
leave-X-chunks out partitioning. By default
partitioners operate on dataset chunks that are
defined by a 'chunks' sample attribute. The name of
the "chunking" attribute can be changed by appending a
colon and the name of the attribute (e.g.
'oddeven:run'). optionally an argument to this option
can also be a file path to a Python script that
creates a custom partitioner instance (advanced).
- --errorfx ERRORFX
-
error function to be applied to the targets and
predictions of each cross-validation data fold. This
can either be a name of any error function in PyMVPA's
mvpa2.misc.errorfx module, or a file path to a Python
script that creates a custom error function
(advanced).
- --avg-datafold-results
-
average result values across data folds generated by
the partitioner. For example to compute a mean
prediction error across all folds of a crossvalidation procedure.
- --balance-training BALANCE_TRAINING
-
If enabled, training samples are balanced within each
data fold. If the keyword 'equal' is given as argument
an equal number of random samples for each unique
target value is chosen. The number of samples per
category is determined by the category with the least
number of samples in the respective training set. An
integer argument will cause the a corresponding number
of samples per category to be randomly selected. A
floating point number argument (interval [0,1])
indicates what fraction of the available samples shall
be selected.
- --sampling-repetitions SAMPLING_REPETITIONS
-
If training set balancing is enabled, how often should
random sample selection be performed for each data
fold. Default: 1
- --permutations PERMUTATIONS
-
Number of Monte-Carlo permutation runs to be computed
for estimating an H0 distribution for all crossvalidation results. Enabling this option will make
reports of corresponding p-values available in the
result summary and output.
- --prob-tail {left,right}
-
which tail of the probability distribution to report
p-values from when evaluating permutation test
results. For example, a cross-validation computing
mean prediction error could report left-tail p-value
for a single-sided test.
Output options:
- -o OUTPUT, --output OUTPUT
-
output filename ('.hdf5' extension is added
automatically if necessary). NOTE: The output format
is suitable for data exchange between PyMVPA commands,
but is not recommended for long-term storage or
exchange as its specific content may vary depending on
the actual software environment. For long-term storage
consider conversion into other data formats (see
'dump' command).
- --hdf5-compression TYPE
-
compression type for HDF5 storage. Available values
depend on the specific HDF5 installation. Typical
values are: 'gzip', 'lzf', 'szip', or integers from 1
to 9 indicating gzip compression levels.
AUTHOR
Written by Michael Hanke & Yaroslav Halchenko, and numerous other contributors.
COPYRIGHT
Copyright © 2006-2016 PyMVPA developers
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.