algorithms.clustering.imm

Module: algorithms.clustering.imm

Inheritance diagram for nipy.algorithms.clustering.imm:

Inheritance diagram of nipy.algorithms.clustering.imm

Infinite mixture model : A generalization of Bayesian mixture models with an unspecified number of classes

Classes

IMM

class nipy.algorithms.clustering.imm.IMM(alpha=0.5, dim=1)

Bases: nipy.algorithms.clustering.bgmm.BGMM

The class implements Infinite Gaussian Mixture model or Dirichlet Proces Mixture Model. This simply a generalization of Bayesian Gaussian Mixture Models with an unknown number of classes.

Methods

average_log_like(x[, tiny]) returns the averaged log-likelihood of the mode for the dataset x
bayes_factor(x, z[, nperm, verbose]) Evaluate the Bayes Factor of the current model using Chib’s method
bic(like[, tiny]) Computation of bic approximation of evidence
check() Checking the shape of sifferent matrices involved in the model
check_x(x) essentially check that x.shape[1]==self.dim
conditional_posterior_proba(x, z[, perm]) Compute the probability of the current parameters of self
cross_validated_update(x, z, plike[, kfold]) This is a step in the sampling procedure
estimate(x[, niter, delta, verbose]) Estimation of the model given a dataset x
evidence(x, z[, nperm, verbose]) See bayes_factor(self, x, z, nperm=0, verbose=0)
guess_priors(x[, nocheck]) Set the priors in order of having them weakly uninformative
guess_regularizing(x[, bcheck]) Set the regularizing priors as weakly informative
initialize(x) initialize z using a k-means algorithm, then upate the parameters
initialize_and_estimate(x[, z, niter, …]) Estimation of self given x
likelihood(x[, plike]) return the likelihood of the model for the data x
likelihood_under_the_prior(x) Computes the likelihood of x under the prior
map_label(x[, like]) return the MAP labelling of x
mixture_likelihood(x) Returns the likelihood of the mixture for x
plugin(means, precisions, weights) Set manually the weights, means and precision of the model
pop(z) compute the population, i.e. the statistics of allocation
probability_under_prior() Compute the probability of the current parameters of self
reduce(z) Reduce the assignments by removing empty clusters and update self.k
sample(x[, niter, sampling_points, init, …]) sample the indicator and parameters
sample_and_average(x[, niter, verbose]) sample the indicator and parameters
sample_indicator(like) Sample the indicator from the likelihood
set_constant_densities([prior_dens]) Set the null and prior densities as constant
set_priors(x) Set the priors in order of having them weakly uninformative
show(x, gd[, density, axes]) Function to plot a GMM, still in progress
show_components(x, gd[, density, mpaxes]) Function to plot a GMM – Currently, works only in 1D
simple_update(x, z, plike) This is a step in the sampling procedure
test(x[, tiny]) Returns the log-likelihood of the mixture for x
train(x[, z, niter, delta, ninit, verbose]) Idem initialize_and_estimate
unweighted_likelihood(x) return the likelihood of each data for each component
unweighted_likelihood_(x) return the likelihood of each data for each component
update(x, z) Update function (draw a sample of the IMM parameters)
update_means(x, z) Given the allocation vector z,
update_precisions(x, z) Given the allocation vector z,
update_weights(z) Given the allocation vector z, resmaple the weights parameter
__init__(alpha=0.5, dim=1)
Parameters:

alpha: float, optional, :

the parameter for cluster creation

dim: int, optional, :

the dimension of the the data

Note: use the function set_priors() to set adapted priors :

average_log_like(x, tiny=1e-15)

returns the averaged log-likelihood of the mode for the dataset x

Parameters:

x: array of shape (n_samples,self.dim) :

the data used in the estimation process

tiny = 1.e-15: a small constant to avoid numerical singularities :

bayes_factor(x, z, nperm=0, verbose=0)

Evaluate the Bayes Factor of the current model using Chib’s method

Parameters:

x: array of shape (nb_samples,dim) :

the data from which bic is computed

z: array of shape (nb_samples), type = np.int :

the corresponding classification

nperm=0: int :

the number of permutations to sample to model the label switching issue in the computation of the Bayes Factor By default, exhaustive permutations are used

verbose=0: verbosity mode :

Returns:

bf (float) the computed evidence (Bayes factor) :

Notes

See: Marginal Likelihood from the Gibbs Output Journal article by Siddhartha Chib; Journal of the American Statistical Association, Vol. 90, 1995

bic(like, tiny=1e-15)

Computation of bic approximation of evidence

Parameters:

like, array of shape (n_samples, self.k) :

component-wise likelihood

tiny=1.e-15, a small constant to avoid numerical singularities :

Returns:

the bic value, float :

check()

Checking the shape of sifferent matrices involved in the model

check_x(x)

essentially check that x.shape[1]==self.dim

x is returned with possibly reshaping

conditional_posterior_proba(x, z, perm=None)

Compute the probability of the current parameters of self given x and z

Parameters:

x: array of shape (nb_samples, dim), :

the data from which bic is computed

z: array of shape (nb_samples), type = np.int, :

the corresponding classification

perm: array ok shape(nperm, self.k),typ=np.int, optional :

all permutation of z under which things will be recomputed By default, no permutation is performed

cross_validated_update(x, z, plike, kfold=10)

This is a step in the sampling procedure that uses internal corss_validation

Parameters:

x: array of shape(n_samples, dim), :

the input data

z: array of shape(n_samples), :

the associated membership variables

plike: array of shape(n_samples), :

the likelihood under the prior

kfold: int, or array of shape(n_samples), optional, :

folds in the cross-validation loop

Returns:

like: array od shape(n_samples), :

the (cross-validated) likelihood of the data

estimate(x, niter=100, delta=0.0001, verbose=0)

Estimation of the model given a dataset x

Parameters:

x array of shape (n_samples,dim) :

the data from which the model is estimated

niter=100: maximal number of iterations in the estimation process :

delta = 1.e-4: increment of data likelihood at which :

convergence is declared

verbose=0: verbosity mode :

Returns:

bic : an asymptotic approximation of model evidence

evidence(x, z, nperm=0, verbose=0)

See bayes_factor(self, x, z, nperm=0, verbose=0)

guess_priors(x, nocheck=0)

Set the priors in order of having them weakly uninformative this is from Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x, array of shape (nb_samples,self.dim) :

the data used in the estimation process

nocheck: boolean, optional, :

if nocheck==True, check is skipped

guess_regularizing(x, bcheck=1)

Set the regularizing priors as weakly informative according to Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x array of shape (n_samples,dim) :

the data used in the estimation process

initialize(x)

initialize z using a k-means algorithm, then upate the parameters

Parameters:

x: array of shape (nb_samples,self.dim) :

the data used in the estimation process

initialize_and_estimate(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)

Estimation of self given x

Parameters:

x array of shape (n_samples,dim) :

the data from which the model is estimated

z = None: array of shape (n_samples) :

a prior labelling of the data to initialize the computation

niter=100: maximal number of iterations in the estimation process :

delta = 1.e-4: increment of data likelihood at which :

convergence is declared

ninit=1: number of initialization performed :

to reach a good solution

verbose=0: verbosity mode :

Returns:

the best model is returned :

likelihood(x, plike=None)

return the likelihood of the model for the data x the values are weighted by the components weights

Parameters:

x: array of shape (n_samples, self.dim), :

the data used in the estimation process

plike: array os shape (n_samples), optional,x :

the desnity of each point under the prior

Returns:

like, array of shape(nbitem,self.k) :

component-wise likelihood :

likelihood_under_the_prior(x)

Computes the likelihood of x under the prior

Parameters:x, array of shape (self.n_samples,self.dim) :
Returns:w, the likelihood of x under the prior model (unweighted) :
map_label(x, like=None)

return the MAP labelling of x

Parameters:

x array of shape (n_samples,dim) :

the data under study

like=None array of shape(n_samples,self.k) :

component-wise likelihood if like==None, it is recomputed

Returns:

z: array of shape(n_samples): the resulting MAP labelling :

of the rows of x

mixture_likelihood(x)

Returns the likelihood of the mixture for x

Parameters:

x: array of shape (n_samples,self.dim) :

the data used in the estimation process

plugin(means, precisions, weights)

Set manually the weights, means and precision of the model

Parameters:

means: array of shape (self.k,self.dim) :

precisions: array of shape (self.k,self.dim,self.dim) :

or (self.k, self.dim)

weights: array of shape (self.k) :

pop(z)

compute the population, i.e. the statistics of allocation

Parameters:

z array of shape (nb_samples), type = np.int :

the allocation variable

Returns:

hist : array shape (self.k) count variable

probability_under_prior()

Compute the probability of the current parameters of self given the priors

reduce(z)

Reduce the assignments by removing empty clusters and update self.k

Parameters:

z: array of shape(n), :

a vector of membership variables changed in place

Returns:

z: the remapped values :

sample(x, niter=1, sampling_points=None, init=False, kfold=None, verbose=0)

sample the indicator and parameters

Parameters:

x: array of shape (n_samples, self.dim) :

the data used in the estimation process

niter: int, :

the number of iterations to perform

sampling_points: array of shape(nbpoints, self.dim), optional :

points where the likelihood will be sampled this defaults to x

kfold: int or array, optional, :

parameter of cross-validation control by default, no cross-validation is used the procedure is faster but less accurate

verbose=0: verbosity mode :

Returns:

likelihood: array of shape(nbpoints) :

total likelihood of the model

sample_and_average(x, niter=1, verbose=0)

sample the indicator and parameters the average values for weights,means, precisions are returned

Parameters:

x = array of shape (nb_samples,dim) :

the data from which bic is computed

niter=1: number of iterations :

Returns:

weights: array of shape (self.k) :

means: array of shape (self.k,self.dim) :

precisions: array of shape (self.k,self.dim,self.dim) :

or (self.k, self.dim) these are the average parameters across samplings

Notes

All this makes sense only if no label switching as occurred so this is wrong in general (asymptotically).

fix: implement a permutation procedure for components identification

sample_indicator(like)

Sample the indicator from the likelihood

Parameters:

like: array of shape (nbitem,self.k) :

component-wise likelihood

Returns:

z: array of shape(nbitem): a draw of the membership variable :

Notes

The behaviour is different from standard bgmm in that z can take arbitrary values

set_constant_densities(prior_dens=None)

Set the null and prior densities as constant (assuming a compact domain)

Parameters:

prior_dens: float, optional :

constant for the prior density

set_priors(x)

Set the priors in order of having them weakly uninformative this is from Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x, array of shape (n_samples,self.dim) :

the data used in the estimation process

show(x, gd, density=None, axes=None)

Function to plot a GMM, still in progress Currently, works only in 1D and 2D

Parameters:

x: array of shape(n_samples, dim) :

the data under study

gd: GridDescriptor instance :

density: array os shape(prod(gd.n_bins)) :

density of the model one the discrete grid implied by gd by default, this is recomputed

show_components(x, gd, density=None, mpaxes=None)

Function to plot a GMM – Currently, works only in 1D

Parameters:

x: array of shape(n_samples, dim) :

the data under study

gd: GridDescriptor instance :

density: array os shape(prod(gd.n_bins)) :

density of the model one the discrete grid implied by gd by default, this is recomputed

mpaxes: axes handle to make the figure, optional, :

if None, a new figure is created

simple_update(x, z, plike)
This is a step in the sampling procedure

that uses internal corss_validation

Parameters:

x: array of shape(n_samples, dim), :

the input data

z: array of shape(n_samples), :

the associated membership variables

plike: array of shape(n_samples), :

the likelihood under the prior

Returns:

like: array od shape(n_samples), :

the likelihood of the data

test(x, tiny=1e-15)

Returns the log-likelihood of the mixture for x

Parameters:

x array of shape (n_samples,self.dim) :

the data used in the estimation process

Returns:

ll: array of shape(n_samples) :

the log-likelihood of the rows of x

train(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)

Idem initialize_and_estimate

unweighted_likelihood(x)

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:

x: array of shape (n_samples,self.dim) :

the data used in the estimation process

Returns:

like, array of shape(n_samples,self.k) :

unweighted component-wise likelihood

Notes

Hopefully faster

unweighted_likelihood_(x)

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:

x: array of shape (n_samples,self.dim) :

the data used in the estimation process

Returns:

like, array of shape(n_samples,self.k) :

unweighted component-wise likelihood

update(x, z)

Update function (draw a sample of the IMM parameters)

Parameters:

x array of shape (n_samples,self.dim) :

the data used in the estimation process

z array of shape (n_samples), type = np.int :

the corresponding classification

update_means(x, z)

Given the allocation vector z, and the corresponding data x, resample the mean

Parameters:

x: array of shape (nb_samples,self.dim) :

the data used in the estimation process

z: array of shape (nb_samples), type = np.int :

the corresponding classification

update_precisions(x, z)

Given the allocation vector z, and the corresponding data x, resample the precisions

Parameters:

x array of shape (nb_samples,self.dim) :

the data used in the estimation process

z array of shape (nb_samples), type = np.int :

the corresponding classification

update_weights(z)

Given the allocation vector z, resmaple the weights parameter

Parameters:

z array of shape (n_samples), type = np.int :

the allocation variable

MixedIMM

class nipy.algorithms.clustering.imm.MixedIMM(alpha=0.5, dim=1)

Bases: nipy.algorithms.clustering.imm.IMM

Particular IMM with an additional null class. The data is supplied together with a sample-related probability of being under the null.

Methods

average_log_like(x[, tiny]) returns the averaged log-likelihood of the mode for the dataset x
bayes_factor(x, z[, nperm, verbose]) Evaluate the Bayes Factor of the current model using Chib’s method
bic(like[, tiny]) Computation of bic approximation of evidence
check() Checking the shape of sifferent matrices involved in the model
check_x(x) essentially check that x.shape[1]==self.dim
conditional_posterior_proba(x, z[, perm]) Compute the probability of the current parameters of self
cross_validated_update(x, z, plike, …[, kfold]) This is a step in the sampling procedure
estimate(x[, niter, delta, verbose]) Estimation of the model given a dataset x
evidence(x, z[, nperm, verbose]) See bayes_factor(self, x, z, nperm=0, verbose=0)
guess_priors(x[, nocheck]) Set the priors in order of having them weakly uninformative
guess_regularizing(x[, bcheck]) Set the regularizing priors as weakly informative
initialize(x) initialize z using a k-means algorithm, then upate the parameters
initialize_and_estimate(x[, z, niter, …]) Estimation of self given x
likelihood(x[, plike]) return the likelihood of the model for the data x
likelihood_under_the_prior(x) Computes the likelihood of x under the prior
map_label(x[, like]) return the MAP labelling of x
mixture_likelihood(x) Returns the likelihood of the mixture for x
plugin(means, precisions, weights) Set manually the weights, means and precision of the model
pop(z) compute the population, i.e. the statistics of allocation
probability_under_prior() Compute the probability of the current parameters of self
reduce(z) Reduce the assignments by removing empty clusters and update self.k
sample(x, null_class_proba[, niter, …]) sample the indicator and parameters
sample_and_average(x[, niter, verbose]) sample the indicator and parameters
sample_indicator(like, null_class_proba) sample the indicator from the likelihood
set_constant_densities([null_dens, prior_dens]) Set the null and prior densities as constant
set_priors(x) Set the priors in order of having them weakly uninformative
show(x, gd[, density, axes]) Function to plot a GMM, still in progress
show_components(x, gd[, density, mpaxes]) Function to plot a GMM – Currently, works only in 1D
simple_update(x, z, plike, null_class_proba) One step in the sampling procedure (one data sweep)
test(x[, tiny]) Returns the log-likelihood of the mixture for x
train(x[, z, niter, delta, ninit, verbose]) Idem initialize_and_estimate
unweighted_likelihood(x) return the likelihood of each data for each component
unweighted_likelihood_(x) return the likelihood of each data for each component
update(x, z) Update function (draw a sample of the IMM parameters)
update_means(x, z) Given the allocation vector z,
update_precisions(x, z) Given the allocation vector z,
update_weights(z) Given the allocation vector z, resmaple the weights parameter
__init__(alpha=0.5, dim=1)
Parameters:

alpha: float, optional, :

the parameter for cluster creation

dim: int, optional, :

the dimension of the the data

Note: use the function set_priors() to set adapted priors :

average_log_like(x, tiny=1e-15)

returns the averaged log-likelihood of the mode for the dataset x

Parameters:

x: array of shape (n_samples,self.dim) :

the data used in the estimation process

tiny = 1.e-15: a small constant to avoid numerical singularities :

bayes_factor(x, z, nperm=0, verbose=0)

Evaluate the Bayes Factor of the current model using Chib’s method

Parameters:

x: array of shape (nb_samples,dim) :

the data from which bic is computed

z: array of shape (nb_samples), type = np.int :

the corresponding classification

nperm=0: int :

the number of permutations to sample to model the label switching issue in the computation of the Bayes Factor By default, exhaustive permutations are used

verbose=0: verbosity mode :

Returns:

bf (float) the computed evidence (Bayes factor) :

Notes

See: Marginal Likelihood from the Gibbs Output Journal article by Siddhartha Chib; Journal of the American Statistical Association, Vol. 90, 1995

bic(like, tiny=1e-15)

Computation of bic approximation of evidence

Parameters:

like, array of shape (n_samples, self.k) :

component-wise likelihood

tiny=1.e-15, a small constant to avoid numerical singularities :

Returns:

the bic value, float :

check()

Checking the shape of sifferent matrices involved in the model

check_x(x)

essentially check that x.shape[1]==self.dim

x is returned with possibly reshaping

conditional_posterior_proba(x, z, perm=None)

Compute the probability of the current parameters of self given x and z

Parameters:

x: array of shape (nb_samples, dim), :

the data from which bic is computed

z: array of shape (nb_samples), type = np.int, :

the corresponding classification

perm: array ok shape(nperm, self.k),typ=np.int, optional :

all permutation of z under which things will be recomputed By default, no permutation is performed

cross_validated_update(x, z, plike, null_class_proba, kfold=10)

This is a step in the sampling procedure that uses internal corss_validation

Parameters:

x: array of shape(n_samples, dim), :

the input data

z: array of shape(n_samples), :

the associated membership variables

plike: array of shape(n_samples), :

the likelihood under the prior

kfold: int, optional, or array :

number of folds in cross-validation loop or set of indexes for the cross-validation procedure

null_class_proba: array of shape(n_samples), :

prior probability to be under the null

Returns:

like: array od shape(n_samples), :

the (cross-validated) likelihood of the data

z: array of shape(n_samples), :

the associated membership variables

Notes

When kfold is an array, there is an internal reshuffling to randomize the order of updates

estimate(x, niter=100, delta=0.0001, verbose=0)

Estimation of the model given a dataset x

Parameters:

x array of shape (n_samples,dim) :

the data from which the model is estimated

niter=100: maximal number of iterations in the estimation process :

delta = 1.e-4: increment of data likelihood at which :

convergence is declared

verbose=0: verbosity mode :

Returns:

bic : an asymptotic approximation of model evidence

evidence(x, z, nperm=0, verbose=0)

See bayes_factor(self, x, z, nperm=0, verbose=0)

guess_priors(x, nocheck=0)

Set the priors in order of having them weakly uninformative this is from Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x, array of shape (nb_samples,self.dim) :

the data used in the estimation process

nocheck: boolean, optional, :

if nocheck==True, check is skipped

guess_regularizing(x, bcheck=1)

Set the regularizing priors as weakly informative according to Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x array of shape (n_samples,dim) :

the data used in the estimation process

initialize(x)

initialize z using a k-means algorithm, then upate the parameters

Parameters:

x: array of shape (nb_samples,self.dim) :

the data used in the estimation process

initialize_and_estimate(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)

Estimation of self given x

Parameters:

x array of shape (n_samples,dim) :

the data from which the model is estimated

z = None: array of shape (n_samples) :

a prior labelling of the data to initialize the computation

niter=100: maximal number of iterations in the estimation process :

delta = 1.e-4: increment of data likelihood at which :

convergence is declared

ninit=1: number of initialization performed :

to reach a good solution

verbose=0: verbosity mode :

Returns:

the best model is returned :

likelihood(x, plike=None)

return the likelihood of the model for the data x the values are weighted by the components weights

Parameters:

x: array of shape (n_samples, self.dim), :

the data used in the estimation process

plike: array os shape (n_samples), optional,x :

the desnity of each point under the prior

Returns:

like, array of shape(nbitem,self.k) :

component-wise likelihood :

likelihood_under_the_prior(x)

Computes the likelihood of x under the prior

Parameters:x, array of shape (self.n_samples,self.dim) :
Returns:w, the likelihood of x under the prior model (unweighted) :
map_label(x, like=None)

return the MAP labelling of x

Parameters:

x array of shape (n_samples,dim) :

the data under study

like=None array of shape(n_samples,self.k) :

component-wise likelihood if like==None, it is recomputed

Returns:

z: array of shape(n_samples): the resulting MAP labelling :

of the rows of x

mixture_likelihood(x)

Returns the likelihood of the mixture for x

Parameters:

x: array of shape (n_samples,self.dim) :

the data used in the estimation process

plugin(means, precisions, weights)

Set manually the weights, means and precision of the model

Parameters:

means: array of shape (self.k,self.dim) :

precisions: array of shape (self.k,self.dim,self.dim) :

or (self.k, self.dim)

weights: array of shape (self.k) :

pop(z)

compute the population, i.e. the statistics of allocation

Parameters:

z array of shape (nb_samples), type = np.int :

the allocation variable

Returns:

hist : array shape (self.k) count variable

probability_under_prior()

Compute the probability of the current parameters of self given the priors

reduce(z)

Reduce the assignments by removing empty clusters and update self.k

Parameters:

z: array of shape(n), :

a vector of membership variables changed in place

Returns:

z: the remapped values :

sample(x, null_class_proba, niter=1, sampling_points=None, init=False, kfold=None, co_clustering=False, verbose=0)

sample the indicator and parameters

Parameters:

x: array of shape (n_samples, self.dim), :

the data used in the estimation process

null_class_proba: array of shape(n_samples), :

the probability to be under the null

niter: int, :

the number of iterations to perform

sampling_points: array of shape(nbpoints, self.dim), optional :

points where the likelihood will be sampled this defaults to x

kfold: int, optional, :

parameter of cross-validation control by default, no cross-validation is used the procedure is faster but less accurate

co_clustering: bool, optional :

if True, return a model of data co-labelling across iterations

verbose=0: verbosity mode :

Returns:

likelihood: array of shape(nbpoints) :

total likelihood of the model

pproba: array of shape(n_samples), :

the posterior of being in the null (the posterior of null_class_proba)

coclust: only if co_clustering==True, :

sparse_matrix of shape (n_samples, n_samples), frequency of co-labelling of each sample pairs across iterations

sample_and_average(x, niter=1, verbose=0)

sample the indicator and parameters the average values for weights,means, precisions are returned

Parameters:

x = array of shape (nb_samples,dim) :

the data from which bic is computed

niter=1: number of iterations :

Returns:

weights: array of shape (self.k) :

means: array of shape (self.k,self.dim) :

precisions: array of shape (self.k,self.dim,self.dim) :

or (self.k, self.dim) these are the average parameters across samplings

Notes

All this makes sense only if no label switching as occurred so this is wrong in general (asymptotically).

fix: implement a permutation procedure for components identification

sample_indicator(like, null_class_proba)

sample the indicator from the likelihood

Parameters:

like: array of shape (nbitem,self.k) :

component-wise likelihood

null_class_proba: array of shape(n_samples), :

prior probability to be under the null

Returns:

z: array of shape(nbitem): a draw of the membership variable :

Notes

Here z=-1 encodes for the null class

set_constant_densities(null_dens=None, prior_dens=None)

Set the null and prior densities as constant (over a supposedly compact domain)

Parameters:

null_dens: float, optional :

constant for the null density

prior_dens: float, optional :

constant for the prior density

set_priors(x)

Set the priors in order of having them weakly uninformative this is from Fraley and raftery; Journal of Classification 24:155-181 (2007)

Parameters:

x, array of shape (n_samples,self.dim) :

the data used in the estimation process

show(x, gd, density=None, axes=None)

Function to plot a GMM, still in progress Currently, works only in 1D and 2D

Parameters:

x: array of shape(n_samples, dim) :

the data under study

gd: GridDescriptor instance :

density: array os shape(prod(gd.n_bins)) :

density of the model one the discrete grid implied by gd by default, this is recomputed

show_components(x, gd, density=None, mpaxes=None)

Function to plot a GMM – Currently, works only in 1D

Parameters:

x: array of shape(n_samples, dim) :

the data under study

gd: GridDescriptor instance :

density: array os shape(prod(gd.n_bins)) :

density of the model one the discrete grid implied by gd by default, this is recomputed

mpaxes: axes handle to make the figure, optional, :

if None, a new figure is created

simple_update(x, z, plike, null_class_proba)

One step in the sampling procedure (one data sweep)

Parameters:

x: array of shape(n_samples, dim), :

the input data

z: array of shape(n_samples), :

the associated membership variables

plike: array of shape(n_samples), :

the likelihood under the prior

null_class_proba: array of shape(n_samples), :

prior probability to be under the null

Returns:

like: array od shape(n_samples), :

the likelihood of the data under the H1 hypothesis

test(x, tiny=1e-15)

Returns the log-likelihood of the mixture for x

Parameters:

x array of shape (n_samples,self.dim) :

the data used in the estimation process

Returns:

ll: array of shape(n_samples) :

the log-likelihood of the rows of x

train(x, z=None, niter=100, delta=0.0001, ninit=1, verbose=0)

Idem initialize_and_estimate

unweighted_likelihood(x)

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:

x: array of shape (n_samples,self.dim) :

the data used in the estimation process

Returns:

like, array of shape(n_samples,self.k) :

unweighted component-wise likelihood

Notes

Hopefully faster

unweighted_likelihood_(x)

return the likelihood of each data for each component the values are not weighted by the component weights

Parameters:

x: array of shape (n_samples,self.dim) :

the data used in the estimation process

Returns:

like, array of shape(n_samples,self.k) :

unweighted component-wise likelihood

update(x, z)

Update function (draw a sample of the IMM parameters)

Parameters:

x array of shape (n_samples,self.dim) :

the data used in the estimation process

z array of shape (n_samples), type = np.int :

the corresponding classification

update_means(x, z)

Given the allocation vector z, and the corresponding data x, resample the mean

Parameters:

x: array of shape (nb_samples,self.dim) :

the data used in the estimation process

z: array of shape (nb_samples), type = np.int :

the corresponding classification

update_precisions(x, z)

Given the allocation vector z, and the corresponding data x, resample the precisions

Parameters:

x array of shape (nb_samples,self.dim) :

the data used in the estimation process

z array of shape (nb_samples), type = np.int :

the corresponding classification

update_weights(z)

Given the allocation vector z, resmaple the weights parameter

Parameters:

z array of shape (n_samples), type = np.int :

the allocation variable

Functions

nipy.algorithms.clustering.imm.co_labelling(z, kmax=None, kmin=None)

return a sparse co-labelling matrix given the label vector z

Parameters:

z: array of shape(n_samples), :

the input labels

kmax: int, optional, :

considers only the labels in the range [0, kmax[

Returns:

colabel: a sparse coo_matrix, :

yields the co labelling of the data i.e. c[i,j]= 1 if z[i]==z[j], 0 otherwise

nipy.algorithms.clustering.imm.main()

Illustrative example of the behaviour of imm