algorithms.clustering.gmm¶
Module: algorithms.clustering.gmm
¶
Inheritance diagram for nipy.algorithms.clustering.gmm
:

Gaussian Mixture Model Class: contains the basic fields and methods of GMMs The class GMM _old uses C bindings which are computationally and memory efficient.
Author : Bertrand Thirion, 2006-2009
Classes¶
GMM
¶
-
class
nipy.algorithms.clustering.gmm.
GMM
(k=1, dim=1, prec_type='full', means=None, precisions=None, weights=None)¶ Bases:
object
Standard GMM.
this class contains the following members k (int): the number of components in the mixture dim (int): is the dimension of the data prec_type = ‘full’ (string) is the parameterization
of the precisions/covariance matrices: either ‘full’ or ‘diagonal’.- means: array of shape (k,dim):
- all the means (mean parameters) of the components
- precisions: array of shape (k,dim,dim):
- the precisions (inverse covariance matrix) of the components
weights: array of shape(k): weights of the mixture
Methods
average_log_like
(x[, tiny])returns the averaged log-likelihood of the mode for the dataset x bic
(like[, tiny])Computation of bic approximation of evidence check
()Checking the shape of different matrices involved in the model check_x
(x)essentially check that x.shape[1]==self.dim estimate
(x[, niter, delta, verbose])Estimation of the model given a dataset x evidence
(x)Computation of bic approximation of evidence guess_regularizing
(x[, bcheck])Set the regularizing priors as weakly informative initialize
(x)Initializes self according to a certain dataset x: 1. initialize_and_estimate
(x[, z, niter, …])Estimation of self given x likelihood
(x)return the likelihood of the model for the data x map_label
(x[, like])return the MAP labelling of x mixture_likelihood
(x)Returns the likelihood of the mixture for x plugin
(means, precisions, weights)Set manually the weights, means and precision of the model pop
(like[, tiny])compute the population, i.e. the statistics of allocation show
(x, gd[, density, axes])Function to plot a GMM, still in progress show_components
(x, gd[, density, mpaxes])Function to plot a GMM – Currently, works only in 1D test
(x[, tiny])Returns the log-likelihood of the mixture for x train
(x[, z, niter, delta, ninit, verbose])Idem initialize_and_estimate unweighted_likelihood
(x)return the likelihood of each data for each component unweighted_likelihood_
(x)return the likelihood of each data for each component update
(x, l)Identical to self._Mstep(x,l) -
__init__
(k=1, dim=1, prec_type='full', means=None, precisions=None, weights=None)¶ Initialize the structure, at least with the dimensions of the problem
Parameters: k (int) the number of classes of the model :
dim (int) the dimension of the problem :
prec_type = ‘full’ : coavriance:precision parameterization
(diagonal ‘diag’ or full ‘full’).
means = None: array of shape (self.k,self.dim) :
precisions = None: array of shape (self.k,self.dim,self.dim) :
or (self.k, self.dim)
weights=None: array of shape (self.k) :
By default, means, precision and weights are set as :
zeros() :
eye() :
1/k ones() :
with the correct dimensions :
-
average_log_like
(x, tiny=1e-15)¶ returns the averaged log-likelihood of the mode for the dataset x
Parameters: x: array of shape (n_samples,self.dim) :
the data used in the estimation process
tiny = 1.e-15: a small constant to avoid numerical singularities :
-
bic
(like, tiny=1e-15)¶ Computation of bic approximation of evidence
Parameters: like, array of shape (n_samples, self.k) :
component-wise likelihood
tiny=1.e-15, a small constant to avoid numerical singularities :
Returns: the bic value, float :
-
check
()¶ Checking the shape of different matrices involved in the model
-
check_x
(x)¶ essentially check that x.shape[1]==self.dim
x is returned with possibly reshaping
-
estimate
(x, niter=100, delta=0.0001, verbose=0)¶ Estimation of the model given a dataset x
Parameters: x array of shape (n_samples,dim) :
the data from which the model is estimated
niter=100: maximal number of iterations in the estimation process :
delta = 1.e-4: increment of data likelihood at which :
convergence is declared
verbose=0: verbosity mode :
Returns: bic : an asymptotic approximation of model evidence
-
evidence
(x)¶ Computation of bic approximation of evidence
Parameters: x array of shape (n_samples,dim) :
the data from which bic is computed
Returns: the bic value :
-
guess_regularizing
(x, bcheck=1)¶ Set the regularizing priors as weakly informative according to Fraley and raftery; Journal of Classification 24:155-181 (2007)
Parameters: x array of shape (n_samples,dim) :
the data used in the estimation process
-
initialize
(x)¶ Initializes self according to a certain dataset x: 1. sets the regularizing hyper-parameters 2. initializes z using a k-means algorithm, then 3. upate the parameters
Parameters: x, array of shape (n_samples,self.dim) :
the data used in the estimation process
-
initialize_and_estimate
(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)¶ Estimation of self given x
Parameters: x array of shape (n_samples,dim) :
the data from which the model is estimated
z = None: array of shape (n_samples) :
a prior labelling of the data to initialize the computation
niter=100: maximal number of iterations in the estimation process :
delta = 1.e-4: increment of data likelihood at which :
convergence is declared
ninit=1: number of initialization performed :
to reach a good solution
verbose=0: verbosity mode :
Returns: the best model is returned :
-
likelihood
(x)¶ return the likelihood of the model for the data x the values are weighted by the components weights
Parameters: x array of shape (n_samples,self.dim) :
the data used in the estimation process
Returns: like, array of shape(n_samples,self.k) :
component-wise likelihood
-
map_label
(x, like=None)¶ return the MAP labelling of x
Parameters: x array of shape (n_samples,dim) :
the data under study
like=None array of shape(n_samples,self.k) :
component-wise likelihood if like==None, it is recomputed
Returns: z: array of shape(n_samples): the resulting MAP labelling :
of the rows of x
-
mixture_likelihood
(x)¶ Returns the likelihood of the mixture for x
Parameters: x: array of shape (n_samples,self.dim) :
the data used in the estimation process
-
plugin
(means, precisions, weights)¶ Set manually the weights, means and precision of the model
Parameters: means: array of shape (self.k,self.dim) :
precisions: array of shape (self.k,self.dim,self.dim) :
or (self.k, self.dim)
weights: array of shape (self.k) :
-
pop
(like, tiny=1e-15)¶ compute the population, i.e. the statistics of allocation
Parameters: like: array of shape (n_samples,self.k): :
the likelihood of each item being in each class
-
show
(x, gd, density=None, axes=None)¶ Function to plot a GMM, still in progress Currently, works only in 1D and 2D
Parameters: x: array of shape(n_samples, dim) :
the data under study
gd: GridDescriptor instance :
density: array os shape(prod(gd.n_bins)) :
density of the model one the discrete grid implied by gd by default, this is recomputed
-
show_components
(x, gd, density=None, mpaxes=None)¶ Function to plot a GMM – Currently, works only in 1D
Parameters: x: array of shape(n_samples, dim) :
the data under study
gd: GridDescriptor instance :
density: array os shape(prod(gd.n_bins)) :
density of the model one the discrete grid implied by gd by default, this is recomputed
mpaxes: axes handle to make the figure, optional, :
if None, a new figure is created
-
test
(x, tiny=1e-15)¶ Returns the log-likelihood of the mixture for x
Parameters: x array of shape (n_samples,self.dim) :
the data used in the estimation process
Returns: ll: array of shape(n_samples) :
the log-likelihood of the rows of x
-
train
(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)¶ Idem initialize_and_estimate
-
unweighted_likelihood
(x)¶ return the likelihood of each data for each component the values are not weighted by the component weights
Parameters: x: array of shape (n_samples,self.dim) :
the data used in the estimation process
Returns: like, array of shape(n_samples,self.k) :
unweighted component-wise likelihood
Notes
Hopefully faster
-
unweighted_likelihood_
(x)¶ return the likelihood of each data for each component the values are not weighted by the component weights
Parameters: x: array of shape (n_samples,self.dim) :
the data used in the estimation process
Returns: like, array of shape(n_samples,self.k) :
unweighted component-wise likelihood
-
update
(x, l)¶ Identical to self._Mstep(x,l)
GridDescriptor
¶
-
class
nipy.algorithms.clustering.gmm.
GridDescriptor
(dim=1, lim=None, n_bins=None)¶ Bases:
object
A tiny class to handle cartesian grids
Methods
make_grid
()Compute the grid points set
(lim[, n_bins])set the limits of the grid and the number of bins -
__init__
(dim=1, lim=None, n_bins=None)¶ Parameters: dim: int, optional, :
the dimension of the grid
lim: list of len(2*self.dim), :
the limits of the grid as (xmin, xmax, ymin, ymax, …)
n_bins: list of len(self.dim), :
the number of bins in each direction
-
make_grid
()¶ Compute the grid points
Returns: grid: array of shape (nb_nodes, self.dim) :
where nb_nodes is the prod of self.n_bins
-
set
(lim, n_bins=10)¶ set the limits of the grid and the number of bins
Parameters: lim: list of len(2*self.dim), :
the limits of the grid as (xmin, xmax, ymin, ymax, …)
n_bins: list of len(self.dim), optional :
the number of bins in each direction
-
Functions¶
-
nipy.algorithms.clustering.gmm.
best_fitting_GMM
(x, krange, prec_type='full', niter=100, delta=0.0001, ninit=1, verbose=0)¶ Given a certain dataset x, find the best-fitting GMM with a number k of classes in a certain range defined by krange
Parameters: x: array of shape (n_samples,dim) :
the data from which the model is estimated
krange: list of floats, :
the range of values to test for k
prec_type: string (to be chosen within ‘full’,’diag’), optional, :
the covariance parameterization
niter: int, optional, :
maximal number of iterations in the estimation process
delta: float, optional, :
increment of data likelihood at which convergence is declared
ninit: int :
number of initialization performed
verbose=0: verbosity mode :
Returns: mg : the best-fitting GMM instance
-
nipy.algorithms.clustering.gmm.
plot2D
(x, my_gmm, z=None, with_dots=True, log_scale=False, mpaxes=None, verbose=0)¶ Given a set of points in a plane and a GMM, plot them
Parameters: x: array of shape (npoints, dim=2), :
sample points
my_gmm: GMM instance, :
whose density has to be ploted
z: array of shape (npoints), optional :
that gives a labelling of the points in x by default, it is not taken into account
with_dots, bool, optional :
whether to plot the dots or not
log_scale: bool, optional :
whether to plot the likelihood in log scale or not
mpaxes=None, int, optional :
if not None, axes handle for plotting
verbose: verbosity mode, optional :
Returns: gd, GridDescriptor instance, :
that represents the grid used in the function
ax, handle to the figure axes :
Notes
my_gmm
is assumed to have have a ‘nixture_likelihood’ method that takes an array of points of shape (np, dim) and returns an array of shape (np,my_gmm.k) that represents the likelihood component-wise