Welcome to Statsmodels’s Documentation

statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at statsmodels.org.

Minimal Examples

Since version 0.5.0 of statsmodels, you can use R-style formulas together with pandas data frames to fit your models. Here is a simple example using ordinary least squares:

In [1]: import numpy as np

In [2]: import statsmodels.api as sm

In [3]: import statsmodels.formula.api as smf

# Load data
In [4]: dat = sm.datasets.get_rdataset("Guerry", "HistData", cache=True).data
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
/usr/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1316                 h.request(req.get_method(), req.selector, req.data, headers,
-> 1317                           encode_chunked=req.has_header('Transfer-encoding'))
   1318             except OSError as err: # timeout error

/usr/lib/python3.7/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1228         """Send a complete request to the server."""
-> 1229         self._send_request(method, url, body, headers, encode_chunked)
   1230 

/usr/lib/python3.7/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1274             body = _encode(body, 'body')
-> 1275         self.endheaders(body, encode_chunked=encode_chunked)
   1276 

/usr/lib/python3.7/http/client.py in endheaders(self, message_body, encode_chunked)
   1223             raise CannotSendHeader()
-> 1224         self._send_output(message_body, encode_chunked=encode_chunked)
   1225 

/usr/lib/python3.7/http/client.py in _send_output(self, message_body, encode_chunked)
   1015         del self._buffer[:]
-> 1016         self.send(msg)
   1017 

/usr/lib/python3.7/http/client.py in send(self, data)
    955             if self.auto_open:
--> 956                 self.connect()
    957             else:

/usr/lib/python3.7/http/client.py in connect(self)
   1383 
-> 1384             super().connect()
   1385 

/usr/lib/python3.7/http/client.py in connect(self)
    927         self.sock = self._create_connection(
--> 928             (self.host,self.port), self.timeout, self.source_address)
    929         self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

/usr/lib/python3.7/socket.py in create_connection(address, timeout, source_address)
    706     err = None
--> 707     for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    708         af, socktype, proto, canonname, sa = res

/usr/lib/python3.7/socket.py in getaddrinfo(host, port, family, type, proto, flags)
    747     addrlist = []
--> 748     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    749         af, socktype, proto, canonname, sa = res

gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
<ipython-input-4-ba67cff39b0d> in <module>()
----> 1 dat = sm.datasets.get_rdataset("Guerry", "HistData", cache=True).data

/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in get_rdataset(dataname, package, cache)
    289                      "master/doc/"+package+"/rst/")
    290     cache = _get_cache(cache)
--> 291     data, from_cache = _get_data(data_base_url, dataname, cache)
    292     data = read_csv(data, index_col=0)
    293     data = _maybe_reset_index(data)

/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in _get_data(base_url, dataname, cache, extension)
    220     url = base_url + (dataname + ".%s") % extension
    221     try:
--> 222         data, from_cache = _urlopen_cached(url, cache)
    223     except HTTPError as err:
    224         if '404' in str(err):

/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in _urlopen_cached(url, cache)
    211     # not using the cache or didn't find it in cache
    212     if not from_cache:
--> 213         data = urlopen(url, timeout=3).read()
    214         if cache is not None:  # then put it in the cache
    215             _cache_it(data, cache_path)

/usr/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

/usr/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    523             req = meth(req)
    524 
--> 525         response = self._open(req, data)
    526 
    527         # post-process response

/usr/lib/python3.7/urllib/request.py in _open(self, req, data)
    541         protocol = req.type
    542         result = self._call_chain(self.handle_open, protocol, protocol +
--> 543                                   '_open', req)
    544         if result:
    545             return result

/usr/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

/usr/lib/python3.7/urllib/request.py in https_open(self, req)
   1358         def https_open(self, req):
   1359             return self.do_open(http.client.HTTPSConnection, req,
-> 1360                 context=self._context, check_hostname=self._check_hostname)
   1361 
   1362         https_request = AbstractHTTPHandler.do_request_

/usr/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1317                           encode_chunked=req.has_header('Transfer-encoding'))
   1318             except OSError as err: # timeout error
-> 1319                 raise URLError(err)
   1320             r = h.getresponse()
   1321         except:

URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

# Fit regression model (using the natural log of one of the regressors)
In [5]: results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-2abecb481881> in <module>()
----> 1 results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()

NameError: name 'dat' is not defined

# Inspect the results
In [6]: print(results.summary())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-5481b4ed0635> in <module>()
----> 1 print(results.summary())

NameError: name 'results' is not defined

You can also use numpy arrays instead of formulas:

In [7]: import numpy as np

In [8]: import statsmodels.api as sm

# Generate artificial data (2 regressors + constant)
In [9]: nobs = 100

In [10]: X = np.random.random((nobs, 2))

In [11]: X = sm.add_constant(X)

In [12]: beta = [1, .1, .5]

In [13]: e = np.random.random(nobs)

In [14]: y = np.dot(X, beta) + e

# Fit regression model
In [15]: results = sm.OLS(y, X).fit()

# Inspect the results
In [16]: print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.183
Model:                            OLS   Adj. R-squared:                  0.166
Method:                 Least Squares   F-statistic:                     10.83
Date:                Mon, 29 Apr 2019   Prob (F-statistic):           5.68e-05
Time:                        19:56:26   Log-Likelihood:                -23.528
No. Observations:                 100   AIC:                             53.06
Df Residuals:                      97   BIC:                             60.87
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.4355      0.081     17.716      0.000       1.275       1.596
x1             0.2664      0.101      2.650      0.009       0.067       0.466
x2             0.4224      0.116      3.635      0.000       0.192       0.653
==============================================================================
Omnibus:                       75.567   Durbin-Watson:                   2.054
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                7.752
Skew:                           0.065   Prob(JB):                       0.0207
Kurtosis:                       1.642   Cond. No.                         5.32
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Have a look at dir(results) to see available results. Attributes are described in results.__doc__ and results methods have their own docstrings.

Citation

When using statsmodels in scientific publication, please consider using the following citation:

Seabold, Skipper, and Josef Perktold. “Statsmodels: Econometric and statistical modeling with python.Proceedings of the 9th Python in Science Conference. 2010.

Bibtex entry:

@inproceedings{seabold2010statsmodels,
  title={Statsmodels: Econometric and statistical modeling with python},
  author={Seabold, Skipper and Perktold, Josef},
  booktitle={9th Python in Science Conference},
  year={2010},
}

Table of Contents

Expand all. Collapse all.

Indices and tables