Man page for preproc
NAME
util.py:14: - apply preprocessing steps to a PyMVPA dataset
SYNOPSIS
pymvpa2
preproc [
--version] [
-h]
-i DATASET [
DATASET ...] [
--chunks CHUNKS_ATTR] [
--strip-invariant-features] [
--poly-detrend DEG] [
--detrend-chunks CHUNKS_ATTR] [
--detrend-coords COORDS_ATTR] [
--detrend-regrs ATTR [
ATTR ...]] [
--filter-passband FREQ [
FREQ ...]] [
--filter-stopband FREQ [
FREQ ...]] [
--sampling-rate FREQ] [
--filter-passloss dB] [
--filter-stopattenuation dB] [
--zscore] [
--zscore-chunks CHUNKS_ATTR] [
--zscore-params PARAM PARAM]
-o OUTPUT [
--hdf5-compression TYPE]
DESCRIPTION
/usr/lib/python2.7/dist-packages/nose/util.py:14: DeprecationWarning: The compiler package is deprecated and removed in Python 3.x.
-
from compiler.consts import CO_GENERATOR
Preprocess a PyMVPA dataset.
This command can apply a number of preprocessing steps to a dataset. Currently
supported are
1. Polynomial de-trending
2. Spectral filtering
3. Feature-wise Z-scoring
All preprocessing steps are applied in the above order. If a different order is
required, preprocessing has to be split into two separate command calls.
POLYNOMIAL DE-TRENDING
This type of de-trending can be used to regress out arbitrary signals. In
addition to polynomials of any degree arbitrary timecourses stored as sample
attributes in a dataset can be used as confound regressors. This detrending
functionality is, in contrast to the implementation of spectral filtering,
also applicable to sparse-sampled data with potentially irregular inter-sample
intervals.
SPECTRAL FILTERING
Several option are provided that are used to construct a Butterworth low-,
high-, or band-pass filter. It is advised to inspect the filtered data
carefully as inappropriate filter settings can lead to unintented side-effect.
Only dataset with a fixed sampling rate are supported. The sampling rate
must be provided.
OPTIONS
- --version
-
show program's version and license information and
exit
- -h, --help, --help-np
-
show this help message and exit. --help-np forcefully
disables the use of a pager for displaying the help.
- -i DATASET [DATASET ...], --input DATASET [DATASET ...]
-
path(s) to one or more PyMVPA dataset files. All
datasets will be merged into a single dataset
(vstack'ed) in order of specification. In some cases
this option may need to be specified more than once if
multiple, but separate, input datasets are required.
Common options for all preprocessing:
- --chunks CHUNKS_ATTR
-
shortcut option to enabled uniform chunkwise
processing for all relevant preprocessing steps (see
--zscore-chunks, --detrend-chunks). This global
setting can be overwritten by additionally specifying
the corresponding individual "chunk" options.
- --strip-invariant-features
-
After all pre-processing steps are done, strip all
invariant features from the dataset.
Options for data detrending:
- --poly-detrend DEG
-
Order of the Legendre polynomial to remove from the
data. This will remove every polynomial up to and
including the provided value. For example, 3 will
remove 0th, 1st, 2nd, and 3rd order polynomials from
the data. np.B.: The 0th polynomial is the baseline
shift, the 1st is the linear trend. If you specify a
single int and the `chunks_attr` parameter is not
None, then this value is used for each chunk. You can
also specify a different polyord value for each chunk
by providing a list or ndarray of polyord values with
the length equal to the number of chunks. Constraints:
value must be convertible to type 'int'. [Default: 1]
- --detrend-chunks CHUNKS_ATTR
-
If None, the whole dataset is detrended at once.
Otherwise, the given samples attribute (given by its
name) is used to define chunks of the dataset that are
processed individually. In that case, all the samples
within a chunk should be in contiguous order and the
chunks should be sorted in order from low to high --
unless the dataset provides information about the
coordinate of each sample in the space that should be
spanned be the polynomials (see `space` argument).
Constraints: value must be `None`, or value must be a
string. [Default: None]
- --detrend-coords COORDS_ATTR
-
name of a samples attribute that is added to the
preprocessed dataset storing the coordinates of each
sample in the space spanned by the polynomials. If an
attribute of such name is already present in the
dataset its values are interpreted as sample
coordinates in the space spanned by the polynomials.
This can be used to detrend datasets with irregular
sample spacing.
- --detrend-regrs ATTR [ATTR ...]
-
List of sample attribute names that should be used as
additional regressors. An example use would be to
regress out motion parameters. Constraints: value must
be `None`, or value must be convertible to list(str).
[Default: None]
Options for spectral filtering:
- --filter-passband FREQ [FREQ ...]
-
critical frequencies of a Butterworth filter's pass
band. Critical frequencies need to match the unit of
the specified sampling rate (see: --sampling-rate). In
case of a band pass filter low and high frequency
cutoffs need to be specified (in this order). For low
and high-pass filters is single cutoff frequency must
be provided. The type of filter (low/high-pass) is
determined from the relation to the stop band
frequency (--filter-stopband).
- --filter-stopband FREQ [FREQ ...]
-
Analog setting to --filter-passband for specifying the
filter's stop band.
- --sampling-rate FREQ
-
sampling rate of the dataset. All frequency
specifications need to match the unit of the sampling
rate.
- --filter-passloss dB
-
maximum loss in the passband (dB). Default: 1 dB
- --filter-stopattenuation dB
-
minimum attenuation in the stopband (dB). Default: 30
dB
Options for data normalization:
- --zscore
-
perform feature normalization by Z-scoring.
- --zscore-chunks CHUNKS_ATTR
-
name of a dataset sample attribute defining chunks of
samples that shall be Z-scored independently. By
default no chunk-wise normalization is done.
- --zscore-params PARAM PARAM
-
define a fixed parameter set (mean, std) for
Z-scoring, instead of computing from actual data.
Output options:
- -o OUTPUT, --output OUTPUT
-
output filename ('.hdf5' extension is added
automatically if necessary). NOTE: The output format
is suitable for data exchange between PyMVPA commands,
but is not recommended for long-term storage or
exchange as its specific content may vary depending on
the actual software environment. For long-term storage
consider conversion into other data formats (see
'dump' command).
- --hdf5-compression TYPE
-
compression type for HDF5 storage. Available values
depend on the specific HDF5 installation. Typical
values are: 'gzip', 'lzf', 'szip', or integers from 1
to 9 indicating gzip compression levels.
-
from compiler.consts import CO_GENERATOR
pymvpa2-preproc 2.6.0
EXAMPLES
Normalize all features in a dataset by Z-scoring
-
$ pymvpa2 preproc --zscore -o ds_preprocessed -i dataset.hdf5
Perform Z-scoring and quadratic detrending of all features, but process all
samples sharing a unique value of the "chunks" sample attribute individually
-
$ pymvpa2 preproc --chunks "chunks" --poly-detrend 2 --zscore -o ds_pp2 -i ds.hdf5
AUTHOR
Written by Michael Hanke & Yaroslav Halchenko, and numerous other contributors.
COPYRIGHT
Copyright © 2006-2016 PyMVPA developers
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.