SHOGUN  v1.1.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
List of all members | Public Member Functions | Protected Member Functions | Protected Attributes
CKMeans Class Reference

Detailed Description

KMeans clustering, partitions the data into k (a-priori specified) clusters.

It minimizes

\[ \sum_{i=1}^k\sum_{x_j\in S_i} (x_j-\mu_i)^2 \]

where $\mu_i$ are the cluster centers and $S_i,\;i=1,\dots,k$ are the index sets of the clusters.

Beware that this algorithm obtains only a local optimum.

cf. http://en.wikipedia.org/wiki/K-means_algorithm

Definition at line 39 of file KMeans.h.

Inheritance diagram for CKMeans:
Inheritance graph
[legend]

Public Member Functions

 CKMeans ()
 CKMeans (int32_t k, CDistance *d)
virtual ~CKMeans ()
virtual EClassifierType get_classifier_type ()
virtual bool load (FILE *srcfile)
virtual bool save (FILE *dstfile)
void set_k (int32_t p_k)
int32_t get_k ()
void set_max_iter (int32_t iter)
float64_t get_max_iter ()
SGVector< float64_tget_radiuses ()
SGMatrix< float64_tget_cluster_centers ()
int32_t get_dimensions ()
virtual const char * get_name () const
- Public Member Functions inherited from CDistanceMachine
 CDistanceMachine ()
virtual ~CDistanceMachine ()
void set_distance (CDistance *d)
CDistanceget_distance ()
void distances_lhs (float64_t *result, int32_t idx_a1, int32_t idx_a2, int32_t idx_b)
void distances_rhs (float64_t *result, int32_t idx_b1, int32_t idx_b2, int32_t idx_a)
virtual CLabelsapply ()
virtual CLabelsapply (CFeatures *data)
virtual float64_t apply (int32_t num)
- Public Member Functions inherited from CMachine
 CMachine ()
virtual ~CMachine ()
virtual bool train (CFeatures *data=NULL)
virtual void set_labels (CLabels *lab)
virtual CLabelsget_labels ()
virtual float64_t get_label (int32_t i)
void set_max_train_time (float64_t t)
float64_t get_max_train_time ()
void set_solver_type (ESolverType st)
ESolverType get_solver_type ()
virtual void set_store_model_features (bool store_model)
- Public Member Functions inherited from CSGObject
 CSGObject ()
 CSGObject (const CSGObject &orig)
virtual ~CSGObject ()
virtual bool is_generic (EPrimitiveType *generic) const
template<class T >
void set_generic ()
void unset_generic ()
virtual void print_serializable (const char *prefix="")
virtual bool save_serializable (CSerializableFile *file, const char *prefix="")
virtual bool load_serializable (CSerializableFile *file, const char *prefix="")
void set_global_io (SGIO *io)
SGIOget_global_io ()
void set_global_parallel (Parallel *parallel)
Parallelget_global_parallel ()
void set_global_version (Version *version)
Versionget_global_version ()
SGVector< char * > get_modelsel_names ()
char * get_modsel_param_descr (const char *param_name)
index_t get_modsel_param_index (const char *param_name)

Protected Member Functions

void clustknb (bool use_old_mus, float64_t *mus_start)
virtual bool train_machine (CFeatures *data=NULL)
virtual void store_model_features ()

Protected Attributes

int32_t max_iter
 maximum number of iterations
int32_t k
 the k parameter in KMeans
int32_t dimensions
 number of dimensions
SGVector< float64_tR
 radi of the clusters (size k)
- Protected Attributes inherited from CDistanceMachine
CDistancedistance
- Protected Attributes inherited from CMachine
float64_t max_train_time
CLabelslabels
ESolverType solver_type
bool m_store_model_features

Additional Inherited Members

- Public Attributes inherited from CSGObject
SGIOio
Parallelparallel
Versionversion
Parameterm_parameters
Parameterm_model_selection_parameters
- Static Protected Member Functions inherited from CDistanceMachine
static void * run_distance_thread_lhs (void *p)
static void * run_distance_thread_rhs (void *p)

Constructor & Destructor Documentation

CKMeans ( )

default constructor

Definition at line 29 of file KMeans.cpp.

CKMeans ( int32_t  k,
CDistance d 
)

constructor

Parameters
kparameter k
ddistance

Definition at line 35 of file KMeans.cpp.

~CKMeans ( )
virtual

Definition at line 43 of file KMeans.cpp.

Member Function Documentation

void clustknb ( bool  use_old_mus,
float64_t mus_start 
)
protected

clustknb

Parameters
use_old_musif old mus shall be used
mus_startmus start

replace rhs feature vectors

set rhs to mus_start

update rhs

Definition at line 179 of file KMeans.cpp.

virtual EClassifierType get_classifier_type ( )
virtual

get classifier type

Returns
classifier type KMEANS

Reimplemented from CMachine.

Definition at line 57 of file KMeans.h.

SGMatrix< float64_t > get_cluster_centers ( )

get centers

Returns
cluster centers or empty matrix if no radiuses are there (not trained yet)

Definition at line 115 of file KMeans.cpp.

int32_t get_dimensions ( )

get dimensions

Returns
number of dimensions

Definition at line 127 of file KMeans.cpp.

int32_t get_k ( )

get k

Returns
the parameter k

Definition at line 94 of file KMeans.cpp.

float64_t get_max_iter ( )

get maximum number of iterations

Returns
maximum number of iterations

Definition at line 105 of file KMeans.cpp.

virtual const char* get_name ( ) const
virtual
Returns
object name

Reimplemented from CDistanceMachine.

Definition at line 116 of file KMeans.h.

SGVector< float64_t > get_radiuses ( )

get radiuses

Returns
radiuses

Definition at line 110 of file KMeans.cpp.

bool load ( FILE *  srcfile)
virtual

load distance machine from file

Parameters
srcfilefile to load from
Returns
if loading was successful

Reimplemented from CMachine.

Definition at line 73 of file KMeans.cpp.

bool save ( FILE *  dstfile)
virtual

save distance machine to file

Parameters
dstfilefile to save to
Returns
if saving was successful

Reimplemented from CMachine.

Definition at line 80 of file KMeans.cpp.

void set_k ( int32_t  p_k)

set k

Parameters
p_knew k

Definition at line 88 of file KMeans.cpp.

void set_max_iter ( int32_t  iter)

set maximum number of iterations

Parameters
iterthe new maximum

Definition at line 99 of file KMeans.cpp.

void store_model_features ( )
protectedvirtual

Ensures cluster centers are in lhs of underlying distance

Reimplemented from CDistanceMachine.

Definition at line 464 of file KMeans.cpp.

bool train_machine ( CFeatures data = NULL)
protectedvirtual

train k-means

Parameters
datatraining data (parameter can be avoided if distance or kernel-based classifiers are used and distance/kernels are initialized with train data)
Returns
whether training was successful

Reimplemented from CMachine.

Definition at line 48 of file KMeans.cpp.

Member Data Documentation

int32_t dimensions
protected

number of dimensions

Definition at line 150 of file KMeans.h.

int32_t k
protected

the k parameter in KMeans

Definition at line 147 of file KMeans.h.

int32_t max_iter
protected

maximum number of iterations

Definition at line 144 of file KMeans.h.

SGVector<float64_t> R
protected

radi of the clusters (size k)

Definition at line 153 of file KMeans.h.


The documentation for this class was generated from the following files:

SHOGUN Machine Learning Toolbox - Documentation