Regression analysis for correlated data

KY Liang, SL Zeger - Annual review of public health, 1993 - annualreviews.org
KY Liang, SL Zeger
Annual review of public health, 1993annualreviews.org
Regression analysis is among the most commonly used methods of statistical analysis in
public health research. Its objective is to describe the relationship of a response with
explanatory variables. One example of a regression problem is to identify factors associated
with the racial difference in the risk of low birthweight (29). Regression includes the following
as special cases: linear models for measured responses, logistic models for binary
responses, and survival analyses for times to events. A basic assumption of regression …
Regression analysis is among the most commonly used methods of statistical analysis in public health research. Its objective is to describe the relationship of a response with explanatory variables. One example of a regression problem is to identify factors associated with the racial difference in the risk of low birthweight (29). Regression includes the following as special cases: linear models for measured responses, logistic models for binary responses, and survival analyses for times to events. A basic assumption of regression analysis is that all observations are statistically independent, or at least uncorrelated with each other. In the low birthweight example, this assumption would mean that knowing one child's birthweight status provides no informa tion as to whether another child in the study has a low birthweight. One may argue that the assumption of independence is unlikely to be true if children of the same mother are included in the sample. Due to their common household environment and genes, we would expect a child to have a greater chance of having a low birthweight if his/her sibling had. Data from this hypothetical example can usefully be thought of as being" clustered" into families. Birth weights from different families are likely independent; those from the same cluster are not. This dependence among observations from the same cluster must be accounted for in assessing the relationship between risk factors and health outcomes.
Annual Reviews