algorithms.statistics.empirical_pvalue¶
Module: algorithms.statistics.empirical_pvalue
¶
Inheritance diagram for nipy.algorithms.statistics.empirical_pvalue
:

Routines to get corrected p-values estimates, based on the observations.
It implements 3 approaches:
Benjamini-Hochberg FDR: http://en.wikipedia.org/wiki/False_discovery_rate
a class that fits a Gaussian model to the central part of an histogram, following [1]
[1] Schwartzman A, Dougherty RF, Lee J, Ghahremani D, Taylor JE. Empirical null and false discovery rate analysis in neuroimaging. Neuroimage. 2009 Jan 1;44(1):71-82. PubMed PMID: 18547821. DOI: 10.1016/j.neuroimage.2008.04.182
This is typically necessary to estimate a FDR when one is not certain that the data behaves as a standard normal under H_0.
a model based on Gaussian mixture modelling ‘a la Oxford’
Author : Bertrand Thirion, Yaroslav Halchenko, 2008-2012
Class¶
NormalEmpiricalNull
¶
-
class
nipy.algorithms.statistics.empirical_pvalue.
NormalEmpiricalNull
(x)¶ Bases:
object
Class to compute the empirical null normal fit to the data.
The data which is used to estimate the FDR, assuming a Gaussian null from Schwartzmann et al., NeuroImage 44 (2009) 71–82
Methods
fdr
(theta)Given a threshold theta, find the estimated FDR fdrcurve
()Returns the FDR associated with any point of self.x learn
([left, right])Estimate the proportion, mean and variance of a Gaussian distribution plot
([efp, alpha, bar, mpaxes])Plot the histogram of x threshold
([alpha, verbose])Compute the threshold corresponding to an alpha-level FDR for x uncorrected_threshold
([alpha, verbose])Compute the threshold corresponding to a specificity alpha for x -
__init__
(x)¶ Initialize an empirical null normal object.
Parameters: x : 1D ndarray
The data used to estimate the empirical null.
-
fdr
(theta)¶ Given a threshold theta, find the estimated FDR
Parameters: theta : float or array of shape (n_samples)
values to test
Returns: afp : value of array of shape(n)
-
fdrcurve
()¶ Returns the FDR associated with any point of self.x
-
learn
(left=0.2, right=0.8)¶ Estimate the proportion, mean and variance of a Gaussian distribution for a fraction of the data
Parameters: left: float, optional :
Left cut parameter to prevent fitting non-gaussian data
right: float, optional :
Right cut parameter to prevent fitting non-gaussian data
Notes
This method stores the following attributes:
- mu = mu
- p0 = min(1, np.exp(lp0))
- sqsigma: variance of the estimated normal distribution
- sigma: np.sqrt(sqsigma) : standard deviation of the estimated normal distribution
-
plot
(efp=None, alpha=0.05, bar=1, mpaxes=None)¶ Plot the histogram of x
Parameters: efp : float, optional
The empirical FDR (corresponding to x) if efp==None, the false positive rate threshold plot is not drawn.
alpha : float, optional
The chosen FDR threshold
bar=1 : bool, optional
mpaxes=None: if not None, handle to an axes where the fig :
will be drawn. Avoids creating unnecessarily new figures :
-
threshold
(alpha=0.05, verbose=0)¶ Compute the threshold corresponding to an alpha-level FDR for x
Parameters: alpha : float, optional
the chosen false discovery rate threshold.
verbose : boolean, optional
the verbosity level, if True a plot is generated.
Returns: theta: float :
the critical value associated with the provided FDR
-
uncorrected_threshold
(alpha=0.001, verbose=0)¶ Compute the threshold corresponding to a specificity alpha for x
Parameters: alpha : float, optional
the chosen false discovery rate (FDR) threshold.
verbose : boolean, optional
the verbosity level, if True a plot is generated.
Returns: theta: float :
the critical value associated with the provided p-value
-
Functions¶
-
nipy.algorithms.statistics.empirical_pvalue.
check_p_values
(p_values)¶ Basic checks on the p_values array: values should be within [0,1]
Assures also that p_values are at least in 1d array. None of the checks is performed if p_values is None.
Parameters: p_values : array of shape (n)
The sample p-values
Returns: p_values : array of shape (n)
The sample p-values
-
nipy.algorithms.statistics.empirical_pvalue.
fdr
(p_values=None, verbose=0)¶ Returns the FDR associated with each p value
Parameters: p_values : ndarray of shape (n)
The samples p-value
Returns: q : array of shape(n)
The corresponding fdr values
-
nipy.algorithms.statistics.empirical_pvalue.
fdr_threshold
(p_values, alpha=0.05)¶ Return FDR threshold given p values
Parameters: p_values : array of shape (n), optional
The samples p-value
alpha : float, optional
The desired FDR significance
Returns: critical_p_value: float :
The p value corresponding to the FDR alpha
-
nipy.algorithms.statistics.empirical_pvalue.
gamma_gaussian_fit
(x, test=None, verbose=0, mpaxes=False, bias=1, gaussian_mix=0, return_estimator=False)¶ Computing some prior probabilities that the voxels of a certain map are in class disactivated, null or active using a gamma-Gaussian mixture
Parameters: x: array of shape (nvox,) :
the map to be analysed
test: array of shape (nbitems,), optional :
the test values for which the p-value needs to be computed by default, test = x
verbose: 0, 1 or 2, optional :
verbosity mode, 0 is quiet, and 2 calls matplotlib to display graphs.
mpaxes: matplotlib axes, optional :
axes handle used to plot the figure in verbose mode if None, new axes are created if false, nothing is done
bias: float, optional :
lower bound on the Gaussian variance (to avoid shrinkage)
gaussian_mix: float, optional :
if nonzero, lower bound on the Gaussian mixing weight (to avoid shrinkage)
return_estimator: boolean, optional :
if return_estimator is true, the estimator object is returned.
Returns: bfp: array of shape (nbitems,3) :
The probability of each component in the mixture model for each test value
estimator: nipy.labs.clustering.ggmixture.GGGM object :
The estimator object, returned only if return_estimator is true.
-
nipy.algorithms.statistics.empirical_pvalue.
gaussian_fdr
(x)¶ Return the FDR associated with each value assuming a Gaussian distribution
-
nipy.algorithms.statistics.empirical_pvalue.
gaussian_fdr_threshold
(x, alpha=0.05)¶ Return FDR threshold given normal variates
Given an array x of normal variates, this function returns the critical p-value associated with alpha. x is explicitly assumed to be normal distributed under H_0
Parameters: x: ndarray :
input data
alpha: float, optional :
desired significance
Returns: threshold : float
threshold, given as a Gaussian critical value
-
nipy.algorithms.statistics.empirical_pvalue.
smoothed_histogram_from_samples
(x, bins=None, nbins=256, normalized=False)¶ Smooth histogram corresponding to density underlying the samples in x
Parameters: x: array of shape(n_samples) :
input data
bins: array of shape(nbins+1), optional :
the bins location
nbins: int, optional :
the number of bins of the resulting histogram
normalized: bool, optional :
if True, the result is returned as a density value
Returns: h: array of shape (nbins) :
the histogram
bins: array of shape(nbins+1), :
the bins location
-
nipy.algorithms.statistics.empirical_pvalue.
three_classes_GMM_fit
(x, test=None, alpha=0.01, prior_strength=100, verbose=0, fixed_scale=False, mpaxes=None, bias=0, theta=0, return_estimator=False)¶ Fit the data with a 3-classes Gaussian Mixture Model, i.e. compute some probability that the voxels of a certain map are in class disactivated, null or active
Parameters: x: array of shape (nvox,1) :
The map to be analysed
test: array of shape(nbitems,1), optional :
the test values for which the p-value needs to be computed by default (if None), test=x
alpha: float, optional :
the prior weights of the positive and negative classes
prior_strength: float, optional :
the confidence on the prior (should be compared to size(x))
verbose: int :
verbosity mode
fixed_scale: bool, optional :
boolean, variance parameterization. if True, the variance is locked to 1 otherwise, it is estimated from the data
mpaxes: :
axes handle used to plot the figure in verbose mode if None, new axes are created
bias: bool :
allows a rescaling of the posterior probability that takes into account the threshold theta. Not rigorous.
theta: float :
the threshold used to correct the posterior p-values when bias=1; normally, it is such that test>theta note that if theta = -np.inf, the method has a standard behaviour
return_estimator: boolean, optional :
If return_estimator is true, the estimator object is returned.
Returns: bfp : array of shape (nbitems,3):
the posterior probability of each test item belonging to each component in the GMM (sum to 1 across the 3 classes) if np.size(test)==0, i.e. nbitem==0, None is returned
estimator : nipy.labs.clustering.GMM object
The estimator object, returned only if return_estimator is true.
Notes
Our convention is that:
- class 1 represents the negative class
- class 2 represents the null class
- class 3 represents the positive class