Linear Mixed Effects Models

Linear Mixed Effects models are used for regression analyses involving dependent data. Such data arise when working with longitudinal and other study designs in which multiple observations are made on each subject. Some specific linear mixed effects models are

  • Random intercepts models, where all responses in a group are additively shifted by a value that is specific to the group.
  • Random slopes models, where the responses in a group follow a (conditional) mean trajectory that is linear in the observed covariates, with the slopes (and possibly intercepts) varying by group.
  • Variance components models, where the levels of one or more categorical covariates are associated with draws from distributions. These random terms additively determine the conditional mean of each observation based on its covariate values.

The Statsmodels implementation of LME is primarily group-based, meaning that random effects must be independently-realized for responses in different groups. There are two types of random effects in our implementation of mixed models: (i) random coefficients (possibly vectors) that have an unknown covariance matrix, and (ii) random coefficients that are independent draws from a common univariate distribution. For both (i) and (ii), the random effects influence the conditional mean of a group through their matrix/vector product with a group-specific design matrix.

A simple example of random coefficients, as in (i) above, is:

\[Y_{ij} = \beta_0 + \beta_1X_{ij} + \gamma_{0i} + \gamma_{1i}X_{ij} + \epsilon_{ij}\]

Here, \(Y_{ij}\) is the \(j\), and \(X_{ij}\) is a covariate for this response. The “fixed effects parameters” \(\beta_0\) and \(\beta_1\) are shared by all subjects, and the errors \(\epsilon_{ij}\) are independent of everything else, and identically distributed (with mean zero). The “random effects parameters” \(gamma_{0i}\) and \(gamma_{1i}\) follow a bivariate distribution with mean zero, described by three parameters: \({\rm var}\gamma_{0i}\), \({\rm var}\gamma_{1i}\), and \({\rm cov}(\gamma_{0i}, \gamma_{1i})\). There is also a parameter for \({\rm var}(\epsilon_{ij})\).

A simple example of variance components, as in (ii) above, is:

\[Y_{ijk} = \beta_0 + \eta_{1i} + \eta_{2j} + \epsilon_{ijk}\]

Here, \(Y_{ijk}\) is the \(k\). The only “mean structure parameter” is \(\beta_0\). The \(\eta_{1i}\) are independent and identically distributed with zero mean, and variance \(\tau_1^2\), and the \(\eta_{2j}\) are independent and identically distributed with zero mean, and variance \(\tau_2^2\).

Statsmodels MixedLM handles most non-crossed random effects models, and some crossed models. To include crossed random effects in a model, it is necessary to treat the entire dataset as a single group. The variance components arguments to the model can then be used to define models with various combinations of crossed and non-crossed random effects.

The Statsmodels LME framework currently supports post-estimation inference via Wald tests and confidence intervals on the coefficients, profile likelihood analysis, likelihood ratio testing, and AIC.

Examples

In [1]: import statsmodels.api as sm

In [2]: import statsmodels.formula.api as smf

In [3]: data = sm.datasets.get_rdataset("dietox", "geepack", cache=True).data
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
/usr/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1316                 h.request(req.get_method(), req.selector, req.data, headers,
-> 1317                           encode_chunked=req.has_header('Transfer-encoding'))
   1318             except OSError as err: # timeout error

/usr/lib/python3.7/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1228         """Send a complete request to the server."""
-> 1229         self._send_request(method, url, body, headers, encode_chunked)
   1230 

/usr/lib/python3.7/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1274             body = _encode(body, 'body')
-> 1275         self.endheaders(body, encode_chunked=encode_chunked)
   1276 

/usr/lib/python3.7/http/client.py in endheaders(self, message_body, encode_chunked)
   1223             raise CannotSendHeader()
-> 1224         self._send_output(message_body, encode_chunked=encode_chunked)
   1225 

/usr/lib/python3.7/http/client.py in _send_output(self, message_body, encode_chunked)
   1015         del self._buffer[:]
-> 1016         self.send(msg)
   1017 

/usr/lib/python3.7/http/client.py in send(self, data)
    955             if self.auto_open:
--> 956                 self.connect()
    957             else:

/usr/lib/python3.7/http/client.py in connect(self)
   1383 
-> 1384             super().connect()
   1385 

/usr/lib/python3.7/http/client.py in connect(self)
    927         self.sock = self._create_connection(
--> 928             (self.host,self.port), self.timeout, self.source_address)
    929         self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

/usr/lib/python3.7/socket.py in create_connection(address, timeout, source_address)
    706     err = None
--> 707     for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    708         af, socktype, proto, canonname, sa = res

/usr/lib/python3.7/socket.py in getaddrinfo(host, port, family, type, proto, flags)
    747     addrlist = []
--> 748     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    749         af, socktype, proto, canonname, sa = res

gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
<ipython-input-3-dd9bcc886be5> in <module>()
----> 1 data = sm.datasets.get_rdataset("dietox", "geepack", cache=True).data

/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in get_rdataset(dataname, package, cache)
    289                      "master/doc/"+package+"/rst/")
    290     cache = _get_cache(cache)
--> 291     data, from_cache = _get_data(data_base_url, dataname, cache)
    292     data = read_csv(data, index_col=0)
    293     data = _maybe_reset_index(data)

/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in _get_data(base_url, dataname, cache, extension)
    220     url = base_url + (dataname + ".%s") % extension
    221     try:
--> 222         data, from_cache = _urlopen_cached(url, cache)
    223     except HTTPError as err:
    224         if '404' in str(err):

/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in _urlopen_cached(url, cache)
    211     # not using the cache or didn't find it in cache
    212     if not from_cache:
--> 213         data = urlopen(url, timeout=3).read()
    214         if cache is not None:  # then put it in the cache
    215             _cache_it(data, cache_path)

/usr/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

/usr/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    523             req = meth(req)
    524 
--> 525         response = self._open(req, data)
    526 
    527         # post-process response

/usr/lib/python3.7/urllib/request.py in _open(self, req, data)
    541         protocol = req.type
    542         result = self._call_chain(self.handle_open, protocol, protocol +
--> 543                                   '_open', req)
    544         if result:
    545             return result

/usr/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

/usr/lib/python3.7/urllib/request.py in https_open(self, req)
   1358         def https_open(self, req):
   1359             return self.do_open(http.client.HTTPSConnection, req,
-> 1360                 context=self._context, check_hostname=self._check_hostname)
   1361 
   1362         https_request = AbstractHTTPHandler.do_request_

/usr/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1317                           encode_chunked=req.has_header('Transfer-encoding'))
   1318             except OSError as err: # timeout error
-> 1319                 raise URLError(err)
   1320             r = h.getresponse()
   1321         except:

URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

In [4]: md = smf.mixedlm("Weight ~ Time", data, groups=data["Pig"])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-9efdb46cb28d> in <module>()
----> 1 md = smf.mixedlm("Weight ~ Time", data, groups=data["Pig"])

KeyError: 'Pig'

In [5]: mdf = md.fit()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-22d27fb7daa8> in <module>()
----> 1 mdf = md.fit()

NameError: name 'md' is not defined

In [6]: print(mdf.summary())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-1bd4bb60f2e8> in <module>()
----> 1 print(mdf.summary())

NameError: name 'mdf' is not defined

Detailed examples can be found here

There are some notebook examples on the Wiki: Wiki notebooks for MixedLM

Technical Documentation

The data are partitioned into disjoint groups. The probability model for group \(i\) is:

\[Y = X\beta + Z\gamma + Q_1\eta_1 + \cdots + Q_k\eta_k + \epsilon\]

where

  • \(n_i\) is the number of observations in group \(i\)
  • \(Y\) is a \(n_i\) dimensional response vector
  • \(X\) is a \(n_i * k_{fe}\) dimensional matrix of fixed effects coefficients
  • \(\beta\) is a \(k_{fe}\)-dimensional vector of fixed effects slopes
  • \(Z\) is a \(n_i * k_{re}\) dimensional matrix of random effects coefficients
  • \(\gamma\) is a \(k_{re}\)-dimensional random vector with mean 0 and covariance matrix \(\Psi\); note that each group gets its own independent realization of gamma.
  • \(Q_j\) is a \(n_i \times q_j\) dimensional design matrix for the \(j^{th}\) variance component.
  • \(\eta_j\) is a \(q_j\)-dimensional random vector containing independent and identically distributed values with variance \(\tau_j^2\).
  • \(\epsilon\) is a \(n_i\) dimensional vector of i.i.d normal errors with mean 0 and variance \(\sigma^2\); the \(\epsilon\) values are independent both within and between groups

\(Y, X, \{Q_j\}\) and \(Z\) must be entirely observed. \(\beta\), \(\Psi\), and \(\sigma^2\) are estimated using ML or REML estimation, and \(\gamma\), \(\{\eta_j\}\) and \(\epsilon\) are random so define the probability model.

The marginal mean structure is \(E[Y|X,Z] = X*\beta\). If only the marginal mean structure is of interest, GEE is a good alternative to mixed models.

Notation:

  • \(cov_{re}\) is the random effects covariance matrix (referred to above as \(\Psi\)) and \(scale\) is the (scalar) error variance. There is also a single estimated variance parameter \(\tau_j^2\) for each variance component. For a single group, the marginal covariance matrix of endog given exog is \(scale*I + Z * cov_{re} * Z\), where \(Z\) is the design matrix for the random effects in one group.

References

The primary reference for the implementation details is:

  • MJ Lindstrom, DM Bates (1988). Newton Raphson and EM algorithms for linear mixed effects models for repeated measures data. Journal of the American Statistical Association. Volume 83, Issue 404, pages 1014-1022.

See also this more recent document:

All the likelihood, gradient, and Hessian calculations closely follow Lindstrom and Bates.

The following two documents are written more from the perspective of users:

Module Reference

The model class is:

MixedLM(endog, exog, groups[, exog_re, …]) An object specifying a linear mixed effects model.

The result class is:

MixedLMResults(model, params, cov_params) Class to contain results of fitting a linear mixed effects model.