Gaussian Naive Bayes Classifier.
GNB is a probabilistic classifier relying on Bayes rule to estimate posterior probabilities of labels given the data. Naive assumption in it is an independence of the features, which allows to combine per-feature likelihoods by a simple product across likelihoods of “independent” features. See http://en.wikipedia.org/wiki/Naive_bayes for more information.
Provided here implementation is “naive” on its own – various aspects could be improved, but it has its own advantages:
GNB is listed both as linear and non-linear classifier, since specifics of separating boundary depends on the data and/or parameters: linear separation is achieved whenever samples are balanced (or prior='uniform') and features have the same variance across different classes (i.e. if common_variance=True to enforce this).
Whenever decisions are made based on log-probabilities (parameter logprob=True, which is the default), then conditional attribute values, if enabled, would also contain log-probabilities. Also mention that normalization by the evidence (P(data)) is disabled by default since it has no impact per se on classification decision. You might like to set parameter normalize to True if you want to access properly scaled probabilities in values conditional attribute.
Notes
Available conditional attributes:
(Conditional attributes enabled by default suffixed with +)
Methods
clone() | Create full copy of the classifier. |
generate(ds) | Yield processing results. |
get_postproc() | Returns the post-processing node or None. |
get_sensitivity_analyzer(**kwargs) | Factory method to return an appropriate sensitivity analyzer for |
get_space() | Query the processing space name of this node. |
is_trained([dataset]) | Either classifier was already trained. |
predict(obj, data, *args, **kwargs) | |
repredict(obj, data, *args, **kwargs) | |
reset() | |
retrain(dataset, **kwargs) | Helper to avoid check if data was changed actually changed |
set_postproc(node) | Assigns a post-processing node |
set_space(name) | Set the processing space name of this node. |
summary() | Providing summary over the classifier |
train(ds) | The default implementation calls _pretrain(), _train(), and finally _posttrain(). |
untrain() | Reverts changes in the state of this node caused by previous training |
Initialize an GNB classifier.
Parameters : | common_variance : bool, optional
prior : {laplacian_smoothing, uniform, ratio}, optional
logprob : bool, optional
normalize : bool, optional
enable_ca : None or list of str
disable_ca : None or list of str
auto_train : bool
force_train : bool
space : str, optional
pass_attr : str, list of str|tuple, optional
postproc : Node instance, optional
descr : str
|
---|
Methods
clone() | Create full copy of the classifier. |
generate(ds) | Yield processing results. |
get_postproc() | Returns the post-processing node or None. |
get_sensitivity_analyzer(**kwargs) | Factory method to return an appropriate sensitivity analyzer for |
get_space() | Query the processing space name of this node. |
is_trained([dataset]) | Either classifier was already trained. |
predict(obj, data, *args, **kwargs) | |
repredict(obj, data, *args, **kwargs) | |
reset() | |
retrain(dataset, **kwargs) | Helper to avoid check if data was changed actually changed |
set_postproc(node) | Assigns a post-processing node |
set_space(name) | Set the processing space name of this node. |
summary() | Providing summary over the classifier |
train(ds) | The default implementation calls _pretrain(), _train(), and finally _posttrain(). |
untrain() | Reverts changes in the state of this node caused by previous training |
Means of features per class
Class probabilities
Labels classifier was trained on
Variances per class, but “vars” is taken ;)