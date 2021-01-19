The field of epidemiology, though not bereft of trials, draws its conclusions predominantly from observational data. The first and nearly identical sets of epidemiologic rules of judgment for ascertaining causality were published on both sides of the Atlantic nearly simultaneously (5, 6), apparently not quite independently (7). The US version provides a handy quintet of criteria — consistency, strength, specificity, temporal relationship, and coherence — with which to judge the likelihood of any exposure-disease association being causal. For the purpose of evaluating treatment, temporal association, i.e., that the exposure or treatment preceded the onset of disease is generally a given, and the specific treatment and the specific outcome of interest are usually a clear focus of the research. That leaves three criteria to consider when thinking of observational research in relation to treatments in medicine.

Strength. Strength refers to the size of the observed difference, not the P value associated with it, which only assesses the role of chance in creating the association, whatever the strength of the association. If a judgment about treatment is to be made on the basis of observational data, the effect size had better be substantial. Confounders and biases inevitably arise when study arms are not made comparable through random assignment. Since confounding factors must have effects on the outcome that are larger than the effect being claimed for the treatment, a large effect size puts a cap on the likelihood of a confounder or a bias operating; a 50% reduction in mortality from treatment is much harder to confound than a 20% reduction.

Coherence. Does the intervention make sense in light of what else we know? A prominent component of this criterion is mechanism of action. Few interventions are undertaken without a hypothesized mechanism of action, but the evidence supporting the mechanism can come from a variety of sources, and in vitro does not always translate to in vivo, especially in humans.

Consistency. A treatment repeatedly shown to be effective is more likely to be truly effective than one which seems effective in some studies but not in others.

We suggest one additional criterion that has stood the test of time, and that is the use of total population data to draw conclusions. RCTs are conducted in individuals willing to be enrolled in a study and to accept randomization. Such individuals are usually younger, healthier, more educated, and less likely to be from minority populations. Generalization from trials can therefore be uncertain, but total population data have sample sizes thousands of times larger than any trial and exclude virtually no one.

Some of the best evidence for the effectiveness of cancer screening is the consistent declines in the mortality rates for the four cancers universally screened for in the US — breast, colon, cervix, and prostate — and the correspondence of these declines with the onset of screening and the paucity of alternative explanations for the declines (8, 9). Several studies of whether newborn intensive care reduces neonatal mortality have been based on total population data sources, which, without exception, show lower mortality in high-risk newborns born where intensive care was available (10). These cross-sectional assessments have been amply supported by time-trend findings from vital data in the total US population (11).