SHOGUN  v1.1.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
List of all members | Public Member Functions | Protected Member Functions | Protected Attributes
CCommUlongStringKernel Class Reference

Detailed Description

The CommUlongString kernel may be used to compute the spectrum kernel from strings that have been mapped into unsigned 64bit integers.

These 64bit integers correspond to k-mers. To be applicable in this kernel they need to be sorted (e.g. via the SortUlongString pre-processor).

It basically uses the algorithm in the unix "comm" command (hence the name) to compute:

\[ k({\bf x},({\bf x'})= \Phi_k({\bf x})\cdot \Phi_k({\bf x'}) \]

where $\Phi_k$ maps a sequence ${\bf x}$ that consists of letters in $\Sigma$ to a feature vector of size $|\Sigma|^k$. In this feature vector each entry denotes how often the k-mer appears in that ${\bf x}$.

Note that this representation enables spectrum kernels of order 8 for 8bit alphabets (like binaries) and order 32 for 2-bit alphabets like DNA.

For this kernel the linadd speedups are implemented (though there is room for improvement here when a whole set of sequences is ADDed) using sorted lists.

Definition at line 48 of file CommUlongStringKernel.h.

Inheritance diagram for CCommUlongStringKernel:
Inheritance graph
[legend]

Public Member Functions

 CCommUlongStringKernel (int32_t size=10, bool use_sign=false)
 CCommUlongStringKernel (CStringFeatures< uint64_t > *l, CStringFeatures< uint64_t > *r, bool use_sign=false, int32_t size=10)
virtual ~CCommUlongStringKernel ()
virtual bool init (CFeatures *l, CFeatures *r)
virtual void cleanup ()
virtual EKernelType get_kernel_type ()
virtual const char * get_name () const
virtual bool init_optimization (int32_t count, int32_t *IDX, float64_t *weights)
virtual bool delete_optimization ()
virtual float64_t compute_optimized (int32_t idx)
void merge_dictionaries (int32_t &t, int32_t j, int32_t &k, uint64_t *vec, uint64_t *dic, float64_t *dic_weights, float64_t weight, int32_t vec_idx)
virtual void add_to_normal (int32_t idx, float64_t weight)
virtual void clear_normal ()
virtual void remove_lhs ()
virtual void remove_rhs ()
virtual EFeatureType get_feature_type ()
void get_dictionary (int32_t &dsize, uint64_t *&dict, float64_t *&dweights)
- Public Member Functions inherited from CStringKernel< uint64_t >
 CStringKernel (int32_t cachesize=0)
 CStringKernel (CFeatures *l, CFeatures *r)
virtual EFeatureClass get_feature_class ()
- Public Member Functions inherited from CKernel
 CKernel ()
 CKernel (int32_t size)
 CKernel (CFeatures *l, CFeatures *r, int32_t size)
virtual ~CKernel ()
float64_t kernel (int32_t idx_a, int32_t idx_b)
SGMatrix< float64_tget_kernel_matrix ()
virtual SGVector< float64_tget_kernel_col (int32_t j)
virtual SGVector< float64_tget_kernel_row (int32_t i)
template<class T >
SGMatrix< T > get_kernel_matrix ()
virtual bool set_normalizer (CKernelNormalizer *normalizer)
virtual CKernelNormalizerget_normalizer ()
virtual bool init_normalizer ()
void load (CFile *loader)
void save (CFile *writer)
CFeaturesget_lhs ()
CFeaturesget_rhs ()
virtual int32_t get_num_vec_lhs ()
virtual int32_t get_num_vec_rhs ()
virtual bool has_features ()
bool get_lhs_equals_rhs ()
virtual void remove_lhs_and_rhs ()
void set_cache_size (int32_t size)
int32_t get_cache_size ()
void list_kernel ()
bool has_property (EKernelProperty p)
EOptimizationType get_optimization_type ()
virtual void set_optimization_type (EOptimizationType t)
bool get_is_initialized ()
bool init_optimization_svm (CSVM *svm)
virtual void compute_batch (int32_t num_vec, int32_t *vec_idx, float64_t *target, int32_t num_suppvec, int32_t *IDX, float64_t *alphas, float64_t factor=1.0)
float64_t get_combined_kernel_weight ()
void set_combined_kernel_weight (float64_t nw)
virtual int32_t get_num_subkernels ()
virtual void compute_by_subkernel (int32_t vector_idx, float64_t *subkernel_contrib)
virtual const float64_tget_subkernel_weights (int32_t &num_weights)
virtual void set_subkernel_weights (SGVector< float64_t > weights)
- Public Member Functions inherited from CSGObject
 CSGObject ()
 CSGObject (const CSGObject &orig)
virtual ~CSGObject ()
virtual bool is_generic (EPrimitiveType *generic) const
template<class T >
void set_generic ()
void unset_generic ()
virtual void print_serializable (const char *prefix="")
virtual bool save_serializable (CSerializableFile *file, const char *prefix="")
virtual bool load_serializable (CSerializableFile *file, const char *prefix="")
void set_global_io (SGIO *io)
SGIOget_global_io ()
void set_global_parallel (Parallel *parallel)
Parallelget_global_parallel ()
void set_global_version (Version *version)
Versionget_global_version ()
SGVector< char * > get_modelsel_names ()
char * get_modsel_param_descr (const char *param_name)
index_t get_modsel_param_index (const char *param_name)

Protected Member Functions

float64_t compute (int32_t idx_a, int32_t idx_b)

Protected Attributes

CDynamicArray< uint64_t > dictionary
CDynamicArray< float64_tdictionary_weights
bool use_sign

Additional Inherited Members

- Public Attributes inherited from CSGObject
SGIOio
Parallelparallel
Versionversion
Parameterm_parameters
Parameterm_model_selection_parameters
- Static Protected Member Functions inherited from CKernel
template<class T >
static void * get_kernel_matrix_helper (void *p)

Constructor & Destructor Documentation

CCommUlongStringKernel ( int32_t  size = 10,
bool  use_sign = false 
)

constructor

Parameters
sizecache size
use_signif sign shall be used

Definition at line 19 of file CommUlongStringKernel.cpp.

CCommUlongStringKernel ( CStringFeatures< uint64_t > *  l,
CStringFeatures< uint64_t > *  r,
bool  use_sign = false,
int32_t  size = 10 
)

constructor

Parameters
lfeatures of left-hand side
rfeatures of right-hand side
use_signif sign shall be used
sizecache size

Definition at line 28 of file CommUlongStringKernel.cpp.

~CCommUlongStringKernel ( )
virtual

Definition at line 39 of file CommUlongStringKernel.cpp.

Member Function Documentation

void add_to_normal ( int32_t  idx,
float64_t  weight 
)
virtual

add to normal

Parameters
idxwhere to add
weightwhat to add

Reimplemented from CKernel.

Definition at line 145 of file CommUlongStringKernel.cpp.

void cleanup ( )
virtual

clean up kernel

Reimplemented from CKernel.

Definition at line 73 of file CommUlongStringKernel.cpp.

void clear_normal ( )
virtual

clear normal

Reimplemented from CKernel.

Definition at line 210 of file CommUlongStringKernel.cpp.

float64_t compute ( int32_t  idx_a,
int32_t  idx_b 
)
protectedvirtual

compute kernel function for features a and b idx_{a,b} denote the index of the feature vectors in the corresponding feature object

Parameters
idx_aindex a
idx_bindex b
Returns
computed kernel function at indices a,b

Implements CKernel.

Definition at line 80 of file CommUlongStringKernel.cpp.

float64_t compute_optimized ( int32_t  idx)
virtual

compute optimized

Parameters
idxindex to compute
Returns
optimized value at given index

Reimplemented from CKernel.

Definition at line 254 of file CommUlongStringKernel.cpp.

bool delete_optimization ( )
virtual

delete optimization

Returns
if deleting was successful

Reimplemented from CKernel.

Definition at line 245 of file CommUlongStringKernel.cpp.

void get_dictionary ( int32_t &  dsize,
uint64_t *&  dict,
float64_t *&  dweights 
)

get dictionary

Parameters
dsizedictionary size will be stored in here
dictdictionary will be stored in here
dweightsdictionary weights will be stored in here

Definition at line 183 of file CommUlongStringKernel.h.

virtual EFeatureType get_feature_type ( )
virtual

return feature type the kernel can deal with

Returns
feature type ULONG

Reimplemented from CStringKernel< uint64_t >.

Definition at line 175 of file CommUlongStringKernel.h.

virtual EKernelType get_kernel_type ( )
virtual

return what type of kernel we are

Returns
kernel type COMMULONGSTRING

Implements CStringKernel< uint64_t >.

Definition at line 87 of file CommUlongStringKernel.h.

virtual const char* get_name ( ) const
virtual

return the kernel's name

Returns
name CommUlongString

Reimplemented from CStringKernel< uint64_t >.

Definition at line 93 of file CommUlongStringKernel.h.

bool init ( CFeatures l,
CFeatures r 
)
virtual

initialize kernel

Parameters
lfeatures of left-hand side
rfeatures of right-hand side
Returns
if initializing was successful

Reimplemented from CStringKernel< uint64_t >.

Definition at line 67 of file CommUlongStringKernel.cpp.

bool init_optimization ( int32_t  count,
int32_t *  IDX,
float64_t weights 
)
virtual

initialize optimization

Parameters
countcount
IDXindex
weightsweights
Returns
if initializing was successful

Reimplemented from CKernel.

Definition at line 217 of file CommUlongStringKernel.cpp.

void merge_dictionaries ( int32_t &  t,
int32_t  j,
int32_t &  k,
uint64_t *  vec,
uint64_t *  dic,
float64_t dic_weights,
float64_t  weight,
int32_t  vec_idx 
)

merge dictionaries

Parameters
tt
jj
kk
vecvector
dicdictionary
dic_weightsdictionary weights
weightweight
vec_idxvector index

Definition at line 129 of file CommUlongStringKernel.h.

void remove_lhs ( )
virtual

remove lhs from kernel

Reimplemented from CKernel.

Definition at line 44 of file CommUlongStringKernel.cpp.

void remove_rhs ( )
virtual

remove rhs from kernel

Reimplemented from CKernel.

Definition at line 57 of file CommUlongStringKernel.cpp.

Member Data Documentation

CDynamicArray<uint64_t> dictionary
protected

dictionary

Definition at line 204 of file CommUlongStringKernel.h.

CDynamicArray<float64_t> dictionary_weights
protected

dictionary weights

Definition at line 206 of file CommUlongStringKernel.h.

bool use_sign
protected

if sign shall be used

Definition at line 209 of file CommUlongStringKernel.h.


The documentation for this class was generated from the following files:

SHOGUN Machine Learning Toolbox - Documentation