Generalized Estimating Equations

Generalized Estimating Equations estimate generalized linear models for panel, cluster or repeated measures data when the observations are possibly correlated withing a cluster but uncorrelated across clusters. It supports estimation of the same one-parameter exponential families as Generalized Linear models (GLM).

See Module Reference for commands and arguments.

Examples

The following illustrates a Poisson regression with exchangeable correlation within clusters using data on epilepsy seizures.

In [1]: import statsmodels.api as sm

In [2]: import statsmodels.formula.api as smf

In [3]: data = sm.datasets.get_rdataset('epil', package='MASS', cache=True).data
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
/usr/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1316                 h.request(req.get_method(), req.selector, req.data, headers,
-> 1317                           encode_chunked=req.has_header('Transfer-encoding'))
   1318             except OSError as err: # timeout error

/usr/lib/python3.7/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1228         """Send a complete request to the server."""
-> 1229         self._send_request(method, url, body, headers, encode_chunked)
   1230 

/usr/lib/python3.7/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1274             body = _encode(body, 'body')
-> 1275         self.endheaders(body, encode_chunked=encode_chunked)
   1276 

/usr/lib/python3.7/http/client.py in endheaders(self, message_body, encode_chunked)
   1223             raise CannotSendHeader()
-> 1224         self._send_output(message_body, encode_chunked=encode_chunked)
   1225 

/usr/lib/python3.7/http/client.py in _send_output(self, message_body, encode_chunked)
   1015         del self._buffer[:]
-> 1016         self.send(msg)
   1017 

/usr/lib/python3.7/http/client.py in send(self, data)
    955             if self.auto_open:
--> 956                 self.connect()
    957             else:

/usr/lib/python3.7/http/client.py in connect(self)
   1383 
-> 1384             super().connect()
   1385 

/usr/lib/python3.7/http/client.py in connect(self)
    927         self.sock = self._create_connection(
--> 928             (self.host,self.port), self.timeout, self.source_address)
    929         self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

/usr/lib/python3.7/socket.py in create_connection(address, timeout, source_address)
    706     err = None
--> 707     for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    708         af, socktype, proto, canonname, sa = res

/usr/lib/python3.7/socket.py in getaddrinfo(host, port, family, type, proto, flags)
    747     addrlist = []
--> 748     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    749         af, socktype, proto, canonname, sa = res

gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
<ipython-input-3-c0fa50268696> in <module>()
----> 1 data = sm.datasets.get_rdataset('epil', package='MASS', cache=True).data

/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in get_rdataset(dataname, package, cache)
    289                      "master/doc/"+package+"/rst/")
    290     cache = _get_cache(cache)
--> 291     data, from_cache = _get_data(data_base_url, dataname, cache)
    292     data = read_csv(data, index_col=0)
    293     data = _maybe_reset_index(data)

/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in _get_data(base_url, dataname, cache, extension)
    220     url = base_url + (dataname + ".%s") % extension
    221     try:
--> 222         data, from_cache = _urlopen_cached(url, cache)
    223     except HTTPError as err:
    224         if '404' in str(err):

/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/datasets/utils.py in _urlopen_cached(url, cache)
    211     # not using the cache or didn't find it in cache
    212     if not from_cache:
--> 213         data = urlopen(url, timeout=3).read()
    214         if cache is not None:  # then put it in the cache
    215             _cache_it(data, cache_path)

/usr/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

/usr/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    523             req = meth(req)
    524 
--> 525         response = self._open(req, data)
    526 
    527         # post-process response

/usr/lib/python3.7/urllib/request.py in _open(self, req, data)
    541         protocol = req.type
    542         result = self._call_chain(self.handle_open, protocol, protocol +
--> 543                                   '_open', req)
    544         if result:
    545             return result

/usr/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

/usr/lib/python3.7/urllib/request.py in https_open(self, req)
   1358         def https_open(self, req):
   1359             return self.do_open(http.client.HTTPSConnection, req,
-> 1360                 context=self._context, check_hostname=self._check_hostname)
   1361 
   1362         https_request = AbstractHTTPHandler.do_request_

/usr/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1317                           encode_chunked=req.has_header('Transfer-encoding'))
   1318             except OSError as err: # timeout error
-> 1319                 raise URLError(err)
   1320             r = h.getresponse()
   1321         except:

URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

In [4]: fam = sm.families.Poisson()

In [5]: ind = sm.cov_struct.Exchangeable()

In [6]: mod = smf.gee("y ~ age + trt + base", "subject", data,
   ...:               cov_struct=ind, family=fam)
   ...: 
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-12bac5f45363> in <module>()
      1 mod = smf.gee("y ~ age + trt + base", "subject", data,
----> 2               cov_struct=ind, family=fam)

/build/statsmodels-0.9.0/.pybuild/cpython3_3.7_statsmodels/build/statsmodels/genmod/generalized_estimating_equations.py in from_formula(cls, formula, groups, data, subset, time, offset, exposure, *args, **kwargs)
    668 
    669         if type(groups) == str:
--> 670             groups = data[groups]
    671 
    672         if type(time) == str:

KeyError: 'subject'

In [7]: res = mod.fit()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-fa3ccf53f431> in <module>()
----> 1 res = mod.fit()

NameError: name 'mod' is not defined

In [8]: print(res.summary())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-8-ba064a039ab1> in <module>()
----> 1 print(res.summary())

NameError: name 'res' is not defined

Several notebook examples of the use of GEE can be found on the Wiki: Wiki notebooks for GEE

References

  • KY Liang and S Zeger. “Longitudinal data analysis using generalized linear models”. Biometrika (1986) 73 (1): 13-22.
  • S Zeger and KY Liang. “Longitudinal Data Analysis for Discrete and Continuous Outcomes”. Biometrics Vol. 42, No. 1 (Mar., 1986), pp. 121-130
  • A Rotnitzky and NP Jewell (1990). “Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data”, Biometrika, 77, 485-497.
  • Xu Guo and Wei Pan (2002). “Small sample performance of the score test in GEE”. http://www.sph.umn.edu/faculty1/wp-content/uploads/2012/11/rr2002-013.pdf
  • LA Mancl LA, TA DeRouen (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001 Mar;57(1):126-34.

Module Reference

Model Class

GEE(endog, exog, groups[, time, family, …]) Estimation of marginal regression models using Generalized Estimating Equations (GEE).

Results Classes

GEEResults(model, params, cov_params, scale) This class summarizes the fit of a marginal regression model using GEE.
GEEMargins(results, args[, kwargs]) Estimated marginal effects for a regression model fit with GEE.

Dependence Structures

The dependence structures currently implemented are

CovStruct([cov_nearest_method]) A base class for correlation and covariance structures of grouped data.
Autoregressive([dist_func]) A first-order autoregressive working dependence structure.
Exchangeable() An exchangeable working dependence structure.
GlobalOddsRatio(endog_type) Estimate the global odds ratio for a GEE with ordinal or nominal data.
Independence([cov_nearest_method]) An independence working dependence structure.
Nested([cov_nearest_method]) A nested working dependence structure.

Families

The distribution families are the same as for GLM, currently implemented are

Family(link, variance) The parent class for one-parameter exponential families.
Binomial([link]) Binomial exponential family distribution.
Gamma([link]) Gamma exponential family distribution.
Gaussian([link]) Gaussian exponential family distribution.
InverseGaussian([link]) InverseGaussian exponential family.
NegativeBinomial([link, alpha]) Negative Binomial exponential family.
Poisson([link]) Poisson exponential family.