Log-linear models for categorical data with misclassification and double sampling

TT Chen - Journal of the American Statistical Association, 1979 - Taylor & Francis
Journal of the American Statistical Association, 1979Taylor & Francis
Errors in the collection of categorical data lead to misclassification of observed counts.
Several authors have proposed a double sampling scheme. This article develops a method
for analysis of double sampling data. First, a log-linear model is selected for the
misclassification matrix that relates the fallible to the correct data; then another log-linear
model is built on the distribution of the correct classifications. Thus, the error structure can be
utilized in inference of the relationships among the correct classifications. The statistical …
Abstract
Errors in the collection of categorical data lead to misclassification of observed counts. Several authors have proposed a double sampling scheme. This article develops a method for analysis of double sampling data. First, a log-linear model is selected for the misclassification matrix that relates the fallible to the correct data; then another log-linear model is built on the distribution of the correct classifications. Thus, the error structure can be utilized in inference of the relationships among the correct classifications. The statistical principles used are maximum likelihood estimation and goodness-of-fit tests. An example from epidemiology illustrates the methodology.
Taylor & Francis Online