The MNIST dataset of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
See http://yann.lecun.com/exdb/mnist for more information.
Note
The version that is offered here is identical to the four files distributed there, but has been converted into a single HDF5 file than can easily be read by PyMVPA.
Yann LeCun (Courant Institute, NYU) and Corinna Cortes (Google Labs, New York) hold the copyright of MNIST dataset, which is a derivative work from original NIST datasets. MNIST dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license.
>>> from mvpa2.suite import *
>>> filepath = os.path.join(pymvpa_datadbroot, 'mnist', "mnist.hdf5")
>>> datasets = h5load(filepath)
>>> train = datasets['train']
>>> test = datasets['test']
>>> print train
<Dataset: 60000x784@uint8, <sa: labels>>
>>> print test
<Dataset: 10000x784@uint8, <sa: labels>>
>>> # assign a mapper able to recreate 28x28 pixel image arrays
>>> test.a.mapper = FlattenMapper(shape=(28, 28))
>>> test.mapper.reverse(test).shape
(10000, 28, 28)
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324.