Linear Mixed Effects Models¶
Linear Mixed Effects models are used for regression analyses involving dependent data. Such data arise when working with longitudinal and other study designs in which multiple observations are made on each subject. Some specific linear mixed effects models are
- Random intercepts models, where all responses in a group are additively shifted by a value that is specific to the group.
- Random slopes models, where the responses in a group follow a (conditional) mean trajectory that is linear in the observed covariates, with the slopes (and possibly intercepts) varying by group.
- Variance components models, where the levels of one or more categorical covariates are associated with draws from distributions. These random terms additively determine the conditional mean of each observation based on its covariate values.
The Statsmodels implementation of LME is primarily group-based, meaning that random effects must be independently-realized for responses in different groups. There are two types of random effects in our implementation of mixed models: (i) random coefficients (possibly vectors) that have an unknown covariance matrix, and (ii) random coefficients that are independent draws from a common univariate distribution. For both (i) and (ii), the random effects influence the conditional mean of a group through their matrix/vector product with a group-specific design matrix.
A simple example of random coefficients, as in (i) above, is:
Here, \(Y_{ij}\) is the \(j\), and \(X_{ij}\) is a covariate for this response. The “fixed effects parameters” \(\beta_0\) and \(\beta_1\) are shared by all subjects, and the errors \(\epsilon_{ij}\) are independent of everything else, and identically distributed (with mean zero). The “random effects parameters” \(gamma_{0i}\) and \(gamma_{1i}\) follow a bivariate distribution with mean zero, described by three parameters: \({\rm var}\gamma_{0i}\), \({\rm var}\gamma_{1i}\), and \({\rm cov}(\gamma_{0i}, \gamma_{1i})\). There is also a parameter for \({\rm var}(\epsilon_{ij})\).
A simple example of variance components, as in (ii) above, is:
Here, \(Y_{ijk}\) is the \(k\). The only “mean structure parameter” is \(\beta_0\). The \(\eta_{1i}\) are independent and identically distributed with zero mean, and variance \(\tau_1^2\), and the \(\eta_{2j}\) are independent and identically distributed with zero mean, and variance \(\tau_2^2\).
Statsmodels MixedLM handles most non-crossed random effects models, and some crossed models. To include crossed random effects in a model, it is necessary to treat the entire dataset as a single group. The variance components arguments to the model can then be used to define models with various combinations of crossed and non-crossed random effects.
The Statsmodels LME framework currently supports post-estimation inference via Wald tests and confidence intervals on the coefficients, profile likelihood analysis, likelihood ratio testing, and AIC.
Examples¶
In [1]: import statsmodels.api as sm
In [2]: import statsmodels.formula.api as smf
In [3]: data = sm.datasets.get_rdataset("dietox", "geepack", cache=True).data
---------------------------------------------------------------------------
gaierror Traceback (most recent call last)
/usr/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
1316 h.request(req.get_method(), req.selector, req.data, headers,
-> 1317 encode_chunked=req.has_header('Transfer-encoding'))
1318 except OSError as err: # timeout error
/usr/lib/python3.7/http/client.py in request(self, method, url, body, headers, encode_chunked)
1228 """Send a complete request to the server."""
-> 1229 self._send_request(method, url, body, headers, encode_chunked)
1230
/usr/lib/python3.7/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
1274 body = _encode(body, 'body')
-> 1275 self.endheaders(body, encode_chunked=encode_chunked)
1276
/usr/lib/python3.7/http/client.py in endheaders(self, message_body, encode_chunked)
1223 raise CannotSendHeader()
-> 1224 self._send_output(message_body, encode_chunked=encode_chunked)
1225
/usr/lib/python3.7/http/client.py in _send_output(self, message_body, encode_chunked)
1015 del self._buffer[:]
-> 1016 self.send(msg)
1017
/usr/lib/python3.7/http/client.py in send(self, data)
955 if self.auto_open:
--> 956 self.connect()
957 else:
/usr/lib/python3.7/http/client.py in connect(self)
1383
-> 1384 super().connect()
1385
/usr/lib/python3.7/http/client.py in connect(self)
927 self.sock = self._create_connection(
--> 928 (self.host,self.port), self.timeout, self.source_address)
929 self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
/usr/lib/python3.7/socket.py in create_connection(address, timeout, source_address)
706 err = None
--> 707 for res in getaddrinfo(host, port, 0, SOCK_STREAM):
708 af, socktype, proto, canonname, sa = res
/usr/lib/python3.7/socket.py in getaddrinfo(host, port, family, type, proto, flags)
747 addrlist = []
--> 748 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
749 af, socktype, proto, canonname, sa = res
gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
URLError Traceback (most recent call last)
<ipython-input-3-dd9bcc886be5> in <module>()
----> 1 data = sm.datasets.get_rdataset("dietox", "geepack", cache=True).data
/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in get_rdataset(dataname, package, cache)
289 "master/doc/"+package+"/rst/")
290 cache = _get_cache(cache)
--> 291 data, from_cache = _get_data(data_base_url, dataname, cache)
292 data = read_csv(data, index_col=0)
293 data = _maybe_reset_index(data)
/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in _get_data(base_url, dataname, cache, extension)
220 url = base_url + (dataname + ".%s") % extension
221 try:
--> 222 data, from_cache = _urlopen_cached(url, cache)
223 except HTTPError as err:
224 if '404' in str(err):
/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in _urlopen_cached(url, cache)
211 # not using the cache or didn't find it in cache
212 if not from_cache:
--> 213 data = urlopen(url, timeout=3).read()
214 if cache is not None: # then put it in the cache
215 _cache_it(data, cache_path)
/usr/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
220 else:
221 opener = _opener
--> 222 return opener.open(url, data, timeout)
223
224 def install_opener(opener):
/usr/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
523 req = meth(req)
524
--> 525 response = self._open(req, data)
526
527 # post-process response
/usr/lib/python3.7/urllib/request.py in _open(self, req, data)
541 protocol = req.type
542 result = self._call_chain(self.handle_open, protocol, protocol +
--> 543 '_open', req)
544 if result:
545 return result
/usr/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
501 for handler in handlers:
502 func = getattr(handler, meth_name)
--> 503 result = func(*args)
504 if result is not None:
505 return result
/usr/lib/python3.7/urllib/request.py in https_open(self, req)
1358 def https_open(self, req):
1359 return self.do_open(http.client.HTTPSConnection, req,
-> 1360 context=self._context, check_hostname=self._check_hostname)
1361
1362 https_request = AbstractHTTPHandler.do_request_
/usr/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
1317 encode_chunked=req.has_header('Transfer-encoding'))
1318 except OSError as err: # timeout error
-> 1319 raise URLError(err)
1320 r = h.getresponse()
1321 except:
URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
In [4]: md = smf.mixedlm("Weight ~ Time", data, groups=data["Pig"])