User guide: contents
- 1. Installing scikit-learn
- 2. Tutorials: From the bottom up with scikit-learn
- 1. An introduction to machine learning with scikit-learn
- 2.2. A tutorial on statistical-learning for scientific data processing
- 2.2.1. Statistical learning: the setting and the estimator object in the scikit-learn
- 2.2.2. Supervised learning: predicting an output variable from high-dimensional observations
- 2.2.3. Model selection: choosing estimators and their parameters
- 2.2.4. Unsupervised learning: seeking representations of the data
- 2.2.5. Putting it all together
- 2.2.6. Finding help
- 3. Supervised learning
- 3.1. Generalized Linear Models
- 3.1.1. Ordinary Least Squares
- 3.1.2. Ridge Regression
- 3.1.3. Lasso
- 3.1.4. Elastic Net
- 3.1.5. Multi-task Lasso
- 3.1.6. Least Angle Regression
- 3.1.7. LARS Lasso
- 3.1.8. Orthogonal Matching Pursuit (OMP)
- 3.1.9. Bayesian Regression
- 3.1.10. Logistic regression
- 3.1.11. Stochastic Gradient Descent - SGD
- 3.1.12. Perceptron
- 3.1.13. Passive Aggressive Algorithms
- 3.2. Support Vector Machines
- 3.3. Stochastic Gradient Descent
- 3.4. Nearest Neighbors
- 3.5. Gaussian Processes
- 3.6. Partial Least Squares
- 3.7. Naive Bayes
- 3.8. Decision Trees
- 3.9. Ensemble methods
- 3.10. Multiclass and multilabel algorithms
- 3.11. Feature selection
- 3.12. Semi-Supervised
- 3.13. Linear and Quadratic Discriminant Analysis
- 3.14. Isotonic regression
- 3.1. Generalized Linear Models
- 4. Unsupervised learning
- 4.1. Gaussian mixture models
- 4.1.1. GMM classifier
- 4.1.2. VBGMM classifier: variational Gaussian mixtures
- 4.1.3. DPGMM classifier: Infinite Gaussian mixtures
- 4.1.3.1. Pros and cons of class DPGMM: Diriclet process mixture model
- 4.1.3.2. The Dirichlet Process
- 4.2. Manifold learning
- 4.3. Clustering
- 4.3.1. Overview of clustering methods
- 4.3.2. K-means
- 4.3.3. Affinity Propagation
- 4.3.4. Mean Shift
- 4.3.5. Spectral clustering
- 4.3.6. Hierarchical clustering
- 4.3.7. DBSCAN
- 4.3.8. Clustering performance evaluation
- 4.4. Decomposing signals in components (matrix factorization problems)
- 4.5. Covariance estimation
- 4.6. Novelty and Outlier Detection
- 4.7. Hidden Markov Models
- 4.1. Gaussian mixture models
- 5. Model selection and evaluation
- 5.1. Cross-Validation: evaluating estimator performance
- 5.2. Grid Search: setting estimator parameters
- 5.3. Pipeline: chaining estimators
- 5.4. FeatureUnion: Combining feature extractors
- 5.5. Model evaluation
- 5.5.1. Classification metrics
- 5.5.1.1. Accuracy score
- 5.5.1.2. Area under the curve (AUC)
- 5.5.1.3. Average precision score
- 5.5.1.4. Confusion matrix
- 5.5.1.5. Classification report
- 5.5.1.6. Precision, recall and F-measures
- 5.5.1.7. Hinge loss
- 5.5.1.8. Matthews correlation coefficient
- 5.5.1.9. Receiver operating characteristic (ROC)
- 5.5.1.10. Zero one loss
- 5.5.2. Regression metrics
- 5.5.3. Clustering metrics
- 5.5.4. Dummy estimators
- 5.5.1. Classification metrics
- 6. Dataset transformations
- 6.1. Preprocessing data
- 6.2. Feature extraction
- 6.2.1. Loading features from dicts
- 6.2.2. Feature hashing
- 6.2.3. Text feature extraction
- 6.2.3.1. The Bag of Words representation
- 6.2.3.2. Sparsity
- 6.2.3.3. Common Vectorizer usage
- 6.2.3.4. Tf–idf term weighting
- 6.2.3.5. Applications and examples
- 6.2.3.6. Limitations of the Bag of Words representation
- 6.2.3.7. Vectorizing a large text corpus with the hashing trick
- 6.2.3.8. Customizing the vectorizer classes
- 6.2.4. Image feature extraction
- 6.3. Kernel Approximation
- 6.4. Random Projection
- 6.5. Pairwise metrics, Affinities and Kernels
- 7. Dataset loading utilities
- 7.1. General dataset API
- 7.2. Toy datasets
- 7.3. Sample images
- 7.4. Sample generators
- 7.5. Datasets in svmlight / libsvm format
- 7.6. The Olivetti faces dataset
- 7.7. The 20 newsgroups text dataset
- 7.8. Downloading datasets from the mldata.org repository
- 7.9. The Labeled Faces in the Wild face recognition dataset
- 7.10. Forest covertypes
- 8. Reference
- 8.1. sklearn.cluster: Clustering
- 8.2. sklearn.covariance: Covariance Estimators
- 8.2.1. sklearn.covariance.EmpiricalCovariance
- 8.2.2. sklearn.covariance.EllipticEnvelope
- 8.2.3. sklearn.covariance.GraphLasso
- 8.2.4. sklearn.covariance.GraphLassoCV
- 8.2.5. sklearn.covariance.LedoitWolf
- 8.2.6. sklearn.covariance.MinCovDet
- 8.2.7. sklearn.covariance.OAS
- 8.2.8. sklearn.covariance.ShrunkCovariance
- 8.2.9. sklearn.covariance.empirical_covariance
- 8.2.10. sklearn.covariance.ledoit_wolf
- 8.2.11. sklearn.covariance.shrunk_covariance
- 8.2.12. sklearn.covariance.oas
- 8.2.13. sklearn.covariance.graph_lasso
- 8.3. sklearn.cross_validation: Cross Validation
- 8.3.1. sklearn.cross_validation.Bootstrap
- 8.3.2. sklearn.cross_validation.KFold
- 8.3.3. sklearn.cross_validation.LeaveOneLabelOut
- 8.3.4. sklearn.cross_validation.LeaveOneOut
- 8.3.5. sklearn.cross_validation.LeavePLabelOut
- 8.3.6. sklearn.cross_validation.LeavePOut
- 8.3.7. sklearn.cross_validation.StratifiedKFold
- 8.3.8. sklearn.cross_validation.ShuffleSplit
- 8.3.9. sklearn.cross_validation.StratifiedShuffleSplit
- 8.3.10. sklearn.cross_validation.train_test_split
- 8.3.11. sklearn.cross_validation.cross_val_score
- 8.3.12. sklearn.cross_validation.permutation_test_score
- 8.3.13. sklearn.cross_validation.check_cv
- 8.4. sklearn.datasets: Datasets
- 8.4.1. Loaders
- 8.4.1.1. sklearn.datasets.fetch_20newsgroups
- 8.4.1.2. sklearn.datasets.fetch_20newsgroups_vectorized
- 8.4.1.3. sklearn.datasets.load_boston
- 8.4.1.4. sklearn.datasets.load_diabetes
- 8.4.1.5. sklearn.datasets.load_digits
- 8.4.1.6. sklearn.datasets.load_files
- 8.4.1.7. sklearn.datasets.load_iris
- 8.4.1.8. sklearn.datasets.load_lfw_pairs
- 8.4.1.9. sklearn.datasets.fetch_lfw_pairs
- 8.4.1.10. sklearn.datasets.load_lfw_people
- 8.4.1.11. sklearn.datasets.fetch_lfw_people
- 8.4.1.12. sklearn.datasets.load_linnerud
- 8.4.1.13. sklearn.datasets.fetch_mldata
- 8.4.1.14. sklearn.datasets.fetch_olivetti_faces
- 8.4.1.15. sklearn.datasets.fetch_california_housing
- 8.4.1.16. sklearn.datasets.load_sample_image
- 8.4.1.17. sklearn.datasets.load_sample_images
- 8.4.1.18. sklearn.datasets.load_svmlight_file
- 8.4.1.19. sklearn.datasets.dump_svmlight_file
- 8.4.2. Samples generator
- 8.4.2.1. sklearn.datasets.make_blobs
- 8.4.2.2. sklearn.datasets.make_classification
- 8.4.2.3. sklearn.datasets.make_circles
- 8.4.2.4. sklearn.datasets.make_friedman1
- 8.4.2.5. sklearn.datasets.make_friedman2
- 8.4.2.6. sklearn.datasets.make_friedman3
- 8.4.2.7. sklearn.datasets.make_hastie_10_2
- 8.4.2.8. sklearn.datasets.make_low_rank_matrix
- 8.4.2.9. sklearn.datasets.make_moons
- 8.4.2.10. sklearn.datasets.make_multilabel_classification
- 8.4.2.11. sklearn.datasets.make_regression
- 8.4.2.12. sklearn.datasets.make_s_curve
- 8.4.2.13. sklearn.datasets.make_sparse_coded_signal
- 8.4.2.14. sklearn.datasets.make_sparse_spd_matrix
- 8.4.2.15. sklearn.datasets.make_sparse_uncorrelated
- 8.4.2.16. sklearn.datasets.make_spd_matrix
- 8.4.2.17. sklearn.datasets.make_swiss_roll
- 8.4.1. Loaders
- 8.5. sklearn.decomposition: Matrix Decomposition
- 8.5.1. sklearn.decomposition.PCA
- 8.5.2. sklearn.decomposition.ProbabilisticPCA
- 8.5.3. sklearn.decomposition.ProjectedGradientNMF
- 8.5.4. sklearn.decomposition.RandomizedPCA
- 8.5.5. sklearn.decomposition.KernelPCA
- 8.5.6. sklearn.decomposition.FactorAnalysis
- 8.5.7. sklearn.decomposition.FastICA
- 8.5.8. sklearn.decomposition.NMF
- 8.5.9. sklearn.decomposition.SparsePCA
- 8.5.10. sklearn.decomposition.MiniBatchSparsePCA
- 8.5.11. sklearn.decomposition.SparseCoder
- 8.5.12. sklearn.decomposition.DictionaryLearning
- 8.5.13. sklearn.decomposition.MiniBatchDictionaryLearning
- 8.5.14. sklearn.decomposition.fastica
- 8.5.15. sklearn.decomposition.dict_learning
- 8.5.16. sklearn.decomposition.dict_learning_online
- 8.5.17. sklearn.decomposition.sparse_encode
- 8.6. sklearn.dummy: Dummy estimators
- 8.7. sklearn.ensemble: Ensemble Methods
- 8.7.1. sklearn.ensemble.RandomForestClassifier
- 8.7.2. sklearn.ensemble.RandomTreesEmbedding
- 8.7.3. sklearn.ensemble.RandomForestRegressor
- 8.7.4. sklearn.ensemble.ExtraTreesClassifier
- 8.7.5. sklearn.ensemble.ExtraTreesRegressor
- 8.7.6. sklearn.ensemble.GradientBoostingClassifier
- 8.7.7. sklearn.ensemble.GradientBoostingRegressor
- 8.7.8. partial dependence
- 8.8. sklearn.feature_extraction: Feature Extraction
- 8.9. sklearn.feature_selection: Feature Selection
- 8.9.1. sklearn.feature_selection.SelectPercentile
- 8.9.2. sklearn.feature_selection.SelectKBest
- 8.9.3. sklearn.feature_selection.SelectFpr
- 8.9.4. sklearn.feature_selection.SelectFdr
- 8.9.5. sklearn.feature_selection.SelectFwe
- 8.9.6. sklearn.feature_selection.RFE
- 8.9.7. sklearn.feature_selection.RFECV
- 8.9.8. sklearn.feature_selection.chi2
- 8.9.9. sklearn.feature_selection.f_classif
- 8.9.10. sklearn.feature_selection.f_regression
- 8.10. sklearn.gaussian_process: Gaussian Processes
- 8.10.1. sklearn.gaussian_process.GaussianProcess
- 8.10.2. sklearn.gaussian_process.correlation_models.absolute_exponential
- 8.10.3. sklearn.gaussian_process.correlation_models.squared_exponential
- 8.10.4. sklearn.gaussian_process.correlation_models.generalized_exponential
- 8.10.5. sklearn.gaussian_process.correlation_models.pure_nugget
- 8.10.6. sklearn.gaussian_process.correlation_models.cubic
- 8.10.7. sklearn.gaussian_process.correlation_models.linear
- 8.10.8. sklearn.gaussian_process.regression_models.constant
- 8.10.9. sklearn.gaussian_process.regression_models.linear
- 8.10.10. sklearn.gaussian_process.regression_models.quadratic
- 8.11. sklearn.grid_search: Grid Search
- 8.12. sklearn.hmm: Hidden Markov Models
- 8.13. sklearn.isotonic: Isotonic regression
- 8.14. sklearn.kernel_approximation Kernel Approximation
- 8.15. sklearn.semi_supervised Semi-Supervised Learning
- 8.16. sklearn.lda: Linear Discriminant Analysis
- 8.17. sklearn.linear_model: Generalized Linear Models
- 8.17.1. sklearn.linear_model.ARDRegression
- 8.17.2. sklearn.linear_model.BayesianRidge
- 8.17.3. sklearn.linear_model.ElasticNet
- 8.17.4. sklearn.linear_model.ElasticNetCV
- 8.17.5. sklearn.linear_model.Lars
- 8.17.6. sklearn.linear_model.LarsCV
- 8.17.7. sklearn.linear_model.Lasso
- 8.17.8. sklearn.linear_model.LassoCV
- 8.17.9. sklearn.linear_model.LassoLars
- 8.17.10. sklearn.linear_model.LassoLarsCV
- 8.17.11. sklearn.linear_model.LassoLarsIC
- 8.17.12. sklearn.linear_model.LinearRegression
- 8.17.13. sklearn.linear_model.LogisticRegression
- 8.17.14. sklearn.linear_model.MultiTaskLasso
- 8.17.15. sklearn.linear_model.MultiTaskElasticNet
- 8.17.16. sklearn.linear_model.OrthogonalMatchingPursuit
- 8.17.17. sklearn.linear_model.PassiveAggressiveClassifier
- 8.17.18. sklearn.linear_model.PassiveAggressiveRegressor
- 8.17.19. sklearn.linear_model.Perceptron
- 8.17.20. sklearn.linear_model.RandomizedLasso
- 8.17.21. sklearn.linear_model.RandomizedLogisticRegression
- 8.17.22. sklearn.linear_model.Ridge
- 8.17.23. sklearn.linear_model.RidgeClassifier
- 8.17.24. sklearn.linear_model.RidgeClassifierCV
- 8.17.25. sklearn.linear_model.RidgeCV
- 8.17.26. sklearn.linear_model.SGDClassifier
- 8.17.27. sklearn.linear_model.SGDRegressor
- 8.17.28. sklearn.linear_model.lars_path
- 8.17.29. sklearn.linear_model.lasso_path
- 8.17.30. sklearn.linear_model.lasso_stability_path
- 8.17.31. sklearn.linear_model.orthogonal_mp
- 8.17.32. sklearn.linear_model.orthogonal_mp_gram
- 8.18. sklearn.manifold: Manifold Learning
- 8.19. sklearn.metrics: Metrics
- 8.19.1. Classification metrics
- 8.19.1.1. sklearn.metrics.accuracy_score
- 8.19.1.2. sklearn.metrics.auc
- 8.19.1.3. sklearn.metrics.auc_score
- 8.19.1.4. sklearn.metrics.average_precision_score
- 8.19.1.5. sklearn.metrics.classification_report
- 8.19.1.6. sklearn.metrics.confusion_matrix
- 8.19.1.7. sklearn.metrics.f1_score
- 8.19.1.8. sklearn.metrics.fbeta_score
- 8.19.1.9. sklearn.metrics.hinge_loss
- 8.19.1.10. sklearn.metrics.matthews_corrcoef
- 8.19.1.11. sklearn.metrics.precision_recall_curve
- 8.19.1.12. sklearn.metrics.precision_recall_fscore_support
- 8.19.1.13. sklearn.metrics.precision_score
- 8.19.1.14. sklearn.metrics.recall_score
- 8.19.1.15. sklearn.metrics.roc_curve
- 8.19.1.16. sklearn.metrics.zero_one_loss
- 8.19.2. Regression metrics
- 8.19.3. Clustering metrics
- 8.19.3.1. sklearn.metrics.adjusted_mutual_info_score
- 8.19.3.2. sklearn.metrics.adjusted_rand_score
- 8.19.3.3. sklearn.metrics.completeness_score
- 8.19.3.4. sklearn.metrics.homogeneity_completeness_v_measure
- 8.19.3.5. sklearn.metrics.homogeneity_score
- 8.19.3.6. sklearn.metrics.mutual_info_score
- 8.19.3.7. sklearn.metrics.normalized_mutual_info_score
- 8.19.3.8. sklearn.metrics.silhouette_score
- 8.19.3.9. sklearn.metrics.silhouette_samples
- 8.19.3.10. sklearn.metrics.v_measure_score
- 8.19.4. Pairwise metrics
- 8.19.4.1. sklearn.metrics.pairwise.additive_chi2_kernel
- 8.19.4.2. sklearn.metrics.pairwise.chi2_kernel
- 8.19.4.3. sklearn.metrics.pairwise.distance_metrics
- 8.19.4.4. sklearn.metrics.pairwise.euclidean_distances
- 8.19.4.5. sklearn.metrics.pairwise.kernel_metrics
- 8.19.4.6. sklearn.metrics.pairwise.linear_kernel
- 8.19.4.7. sklearn.metrics.pairwise.manhattan_distances
- 8.19.4.8. sklearn.metrics.pairwise.pairwise_distances
- 8.19.4.9. sklearn.metrics.pairwise.pairwise_kernels
- 8.19.4.10. sklearn.metrics.pairwise.polynomial_kernel
- 8.19.4.11. sklearn.metrics.pairwise.rbf_kernel
- 8.19.1. Classification metrics
- 8.20. sklearn.mixture: Gaussian Mixture Models
- 8.21. sklearn.multiclass: Multiclass and multilabel classification
- 8.21.1. Multiclass and multilabel classification strategies
- 8.21.2. sklearn.multiclass.OneVsRestClassifier
- 8.21.3. sklearn.multiclass.OneVsOneClassifier
- 8.21.4. sklearn.multiclass.OutputCodeClassifier
- 8.21.5. sklearn.multiclass.fit_ovr
- 8.21.6. sklearn.multiclass.predict_ovr
- 8.21.7. sklearn.multiclass.fit_ovo
- 8.21.8. sklearn.multiclass.predict_ovo
- 8.21.9. sklearn.multiclass.fit_ecoc
- 8.21.10. sklearn.multiclass.predict_ecoc
- 8.22. sklearn.naive_bayes: Naive Bayes
- 8.23. sklearn.neighbors: Nearest Neighbors
- 8.23.1. sklearn.neighbors.NearestNeighbors
- 8.23.2. sklearn.neighbors.KNeighborsClassifier
- 8.23.3. sklearn.neighbors.RadiusNeighborsClassifier
- 8.23.4. sklearn.neighbors.KNeighborsRegressor
- 8.23.5. sklearn.neighbors.RadiusNeighborsRegressor
- 8.23.6. sklearn.neighbors.BallTree
- 8.23.7. sklearn.neighbors.NearestCentroid
- 8.23.8. sklearn.neighbors.kneighbors_graph
- 8.23.9. sklearn.neighbors.radius_neighbors_graph
- 8.24. sklearn.pls: Partial Least Squares
- 8.25. sklearn.pipeline: Pipeline
- 8.26. sklearn.preprocessing: Preprocessing and Normalization
- 8.26.1. sklearn.preprocessing.Binarizer
- 8.26.2. sklearn.preprocessing.KernelCenterer
- 8.26.3. sklearn.preprocessing.LabelBinarizer
- 8.26.4. sklearn.preprocessing.LabelEncoder
- 8.26.5. sklearn.preprocessing.MinMaxScaler
- 8.26.6. sklearn.preprocessing.Normalizer
- 8.26.7. sklearn.preprocessing.OneHotEncoder
- 8.26.8. sklearn.preprocessing.StandardScaler
- 8.26.9. sklearn.preprocessing.add_dummy_feature
- 8.26.10. sklearn.preprocessing.balance_weights
- 8.26.11. sklearn.preprocessing.binarize
- 8.26.12. sklearn.preprocessing.normalize
- 8.26.13. sklearn.preprocessing.scale
- 8.27. sklearn.qda: Quadratic Discriminant Analysis
- 8.28. sklearn.random_projection: Random projection
- 8.29. sklearn.svm: Support Vector Machines
- 8.30. sklearn.tree: Decision Trees
- 8.31. sklearn.utils: Utilities